1 |
A few months ago, I filed bug 423651 to ask that bzip2 on the install |
2 |
media be replaced with |
3 |
pbzip2. It was closed a short while later, telling me that it'd |
4 |
involve changing what's kept in @system, and that had to be discussed |
5 |
here, rather than in a bug report. |
6 |
|
7 |
Here's a detailed description of how pbzip2 operates, as described by |
8 |
a friend of mine: |
9 |
|
10 |
> pbzip2's compression routine splits the input into blocks (with a default of 900,000 |
11 |
> bytes), which it then feeds into the standard bzip2 compression routine. The output |
12 |
> of the various calls to the bzip2 compression routine are then concatenated together. |
13 |
> The end result is the same as if you had first used the "split" command on the input, |
14 |
> run individual bzip2 commands on the split pieces, then recombined the individual |
15 |
> bz2 files using cat. |
16 |
> |
17 |
> The down side to this is that you have multiple file headers, footers, and byte-align |
18 |
> padding, plus the fact that bzip2 does a RLE compression stage to fill the buffer it |
19 |
> feeds to the BWT, the main part of the compression routine. If you happen to have a |
20 |
> section with 1MiB of the same byte, the pbzip2 front-end will split that into two blocks |
21 |
> (at the default settings) and feed them to separate bzip2 compressors. bzip2 will |
22 |
> then compress the first block down to a buffer of about 17kiB before passing it on |
23 |
> to be compressed further, and the rest of the data would have fit within this block, if |
24 |
> pbzip2 hadn't split it the way it had. |
25 |
> |
26 |
> As for decompression, pbzip2 can only really do parallel decompression of files that it |
27 |
> created, since it seeks for the bz2 file header in order to split it to different workers. One |
28 |
> reason for this is that the bz2 block header is not byte aligned. |
29 |
|
30 |
I really don't know how to carry this discussion any further than |
31 |
this; I'll answer any questions I can. |
32 |
|
33 |
-- |
34 |
:wq |