1 |
On 2022-02-22, Grant Edwards wrote: |
2 |
> That doesn't work. It shows the size of the drive as the |
3 |
> "uncompressed" size and 0 as compressed: |
4 |
> |
5 |
> # gzip -clt </dev/sdd |
6 |
> compressed uncompressed ratio uncompressed_name |
7 |
> 31658606592 0 0.0% stdout |
8 |
> |
9 |
> The actual size of the compressed data is about 1/3 the value shown |
10 |
> above. |
11 |
> |
12 |
> It's not reading through the stream. It's seeking to the end and |
13 |
> looking at what it thinks is the trailer info. I thought that maybe |
14 |
> using a pipe instead of a file would make it read through the data, |
15 |
> but that doesn't work either: |
16 |
> |
17 |
> $ ls > foo |
18 |
> $ ls -l foo |
19 |
> -rw-r--r-- 1 grante users 12923 Feb 22 07:51 foo |
20 |
> |
21 |
> $ gzip foo |
22 |
> $ ls -l foo.gz |
23 |
> -rw-r--r-- 1 grante users 6083 Feb 22 07:51 foo.gz |
24 |
> |
25 |
> $ gzip -clt <foo.gz |
26 |
> compressed uncompressed ratio uncompressed_name |
27 |
> 6083 12923 53.1% stdout |
28 |
> |
29 |
> $ echo asdf >> foo.gz |
30 |
> |
31 |
> $ gzip -clt <foo.gz |
32 |
> compressed uncompressed ratio uncompressed_name |
33 |
> 6088 174482547 100.0% stdout |
34 |
> |
35 |
> $ cat foo.gz | gzip -clt |
36 |
> compressed uncompressed ratio uncompressed_name |
37 |
> -1 -1 0.0% stdout |
38 |
> |
39 |
> |
40 |
> |
41 |
> Here's relevent portion of the strace for the 'gzip -clt <foo.gz' |
42 |
> where it seeks to end-8 and reads what it thinks is the uncompressed |
43 |
> length and the CRC: |
44 |
> |
45 |
> lseek(0, -8, SEEK_END) = 6080 |
46 |
> read(0, "2\0\0asdf\n", 8) = 8 |
47 |
> write(1, " 6088 17"..., 54) = 54 |
48 |
> close(0) = 0 |
49 |
> close(1) = 0 |
50 |
> exit_group(0) = ? |
51 |
|
52 |
Hi Grant, |
53 |
|
54 |
you're right it doesn't work with the trailing garbage. I wasn't aware |
55 |
it actually seeks even on pipes. |
56 |
|
57 |
By coincidence it seems the next release will even change this behavior: |
58 |
|
59 |
https://git.savannah.gnu.org/cgit/gzip.git/commit/?id=cf26200380585019e927fe3cf5c0ecb7c8b3ef14 |
60 |
|
61 |
But this actually still doesn't solve your problem, since this only |
62 |
adjust the calculation of the uncompressed size, but the compressed size |
63 |
is still derived from stat. |