Gentoo Archives: gentoo-user

From: Felix Kuperjans <felix@××××××××××××××.com>
To: gentoo-user@l.g.o
Subject: Re: [gentoo-user] Re: How to copy gzip data from bytestream?
Date: Wed, 23 Feb 2022 00:17:08
Message-Id: 4355f83b-d97a-a461-e625-2bc6a03a1228@desaster-games.com
In Reply to: [gentoo-user] Re: How to copy gzip data from bytestream? by Grant Edwards
1 On 2022-02-22, Grant Edwards wrote:
2 > That doesn't work. It shows the size of the drive as the
3 > "uncompressed" size and 0 as compressed:
4 >
5 > # gzip -clt </dev/sdd
6 > compressed uncompressed ratio uncompressed_name
7 > 31658606592 0 0.0% stdout
8 >
9 > The actual size of the compressed data is about 1/3 the value shown
10 > above.
11 >
12 > It's not reading through the stream. It's seeking to the end and
13 > looking at what it thinks is the trailer info. I thought that maybe
14 > using a pipe instead of a file would make it read through the data,
15 > but that doesn't work either:
16 >
17 > $ ls > foo
18 > $ ls -l foo
19 > -rw-r--r-- 1 grante users 12923 Feb 22 07:51 foo
20 >
21 > $ gzip foo
22 > $ ls -l foo.gz
23 > -rw-r--r-- 1 grante users 6083 Feb 22 07:51 foo.gz
24 >
25 > $ gzip -clt <foo.gz
26 > compressed uncompressed ratio uncompressed_name
27 > 6083 12923 53.1% stdout
28 >
29 > $ echo asdf >> foo.gz
30 >
31 > $ gzip -clt <foo.gz
32 > compressed uncompressed ratio uncompressed_name
33 > 6088 174482547 100.0% stdout
34 >
35 > $ cat foo.gz | gzip -clt
36 > compressed uncompressed ratio uncompressed_name
37 > -1 -1 0.0% stdout
38 >
39 >
40 >
41 > Here's relevent portion of the strace for the 'gzip -clt <foo.gz'
42 > where it seeks to end-8 and reads what it thinks is the uncompressed
43 > length and the CRC:
44 >
45 > lseek(0, -8, SEEK_END) = 6080
46 > read(0, "2\0\0asdf\n", 8) = 8
47 > write(1, " 6088 17"..., 54) = 54
48 > close(0) = 0
49 > close(1) = 0
50 > exit_group(0) = ?
51
52 Hi Grant,
53
54 you're right it doesn't work with the trailing garbage. I wasn't aware
55 it actually seeks even on pipes.
56
57 By coincidence it seems the next release will even change this behavior:
58
59 https://git.savannah.gnu.org/cgit/gzip.git/commit/?id=cf26200380585019e927fe3cf5c0ecb7c8b3ef14
60
61 But this actually still doesn't solve your problem, since this only
62 adjust the calculation of the uncompressed size, but the compressed size
63 is still derived from stat.