1 |
On Mon, Feb 21, 2022 at 8:29 PM Grant Edwards <grant.b.edwards@×××××.com> wrote: |
2 |
> |
3 |
> But I was trying to figure out a way to do it without uncompressing |
4 |
> and recompressing the data. I had hoped that the gzip header would |
5 |
> contain a "length" field (so I would know how many bytes to copy using |
6 |
> dd), but it does not. Apparently, the only way to find the end of the |
7 |
> compressed data is to parse it using the proper algorithm (deflate, in |
8 |
> this case). |
9 |
|
10 |
I'm guessing that the reason it lacks such a header, is precisely so |
11 |
that you can use it in a stream in just this manner. In order to have |
12 |
a length in the header it would need to be able to seek back to the |
13 |
start of the file to modify the header, which isn't always possible. |
14 |
|
15 |
I wouldn't be surprised if it stores some kind of metadata at the end |
16 |
of the file, but of course you can only find that if the end of the |
17 |
file is marked in some way. Tapes sometimes have ways to seek to the |
18 |
end of a recording - the drive can record a pattern that is detectable |
19 |
while seeking at high speed. Obviously USB drives lack such a |
20 |
mechanism unless provided by a filesystem or whatever application |
21 |
wrote the data. |
22 |
|
23 |
If you google the details of the gzip file format you might be able to |
24 |
figure out how to identify the end of the file, scan the image to find |
25 |
this marker, and then use dd to extract just the desired range. |
26 |
Unless the file is VERY large I suspect that is going to take you |
27 |
longer than just recompressing it all. I can't imagine that there is |
28 |
any way around sequentially reading the entire file to find the end, |
29 |
unless you have some mechanism that can read a random block and |
30 |
determine if it is valid gzip data and if so you can do a binary |
31 |
search assuming the data on the drive past the end of the file isn't |
32 |
valid gzip. |
33 |
|
34 |
-- |
35 |
Rich |