Gentoo Archives: gentoo-dev

From: Michael Mol <mikemol@×××××.com>
To: gentoo-dev@l.g.o
Subject: [gentoo-dev] ship app-arch/pbzip2 instead of app-arch/bzip2
Date: Wed, 26 Sep 2012 20:32:36
Message-Id: CA+czFiDJ3Tmus_rRom=y7rw3aqkkHt4r0NPM+35fBHV7zDs9cw@mail.gmail.com
1 A few months ago, I filed bug 423651 to ask that bzip2 on the install
2 media be replaced with
3 pbzip2. It was closed a short while later, telling me that it'd
4 involve changing what's kept in @system, and that had to be discussed
5 here, rather than in a bug report.
6
7 Here's a detailed description of how pbzip2 operates, as described by
8 a friend of mine:
9
10 > pbzip2's compression routine splits the input into blocks (with a default of 900,000
11 > bytes), which it then feeds into the standard bzip2 compression routine. The output
12 > of the various calls to the bzip2 compression routine are then concatenated together.
13 > The end result is the same as if you had first used the "split" command on the input,
14 > run individual bzip2 commands on the split pieces, then recombined the individual
15 > bz2 files using cat.
16 >
17 > The down side to this is that you have multiple file headers, footers, and byte-align
18 > padding, plus the fact that bzip2 does a RLE compression stage to fill the buffer it
19 > feeds to the BWT, the main part of the compression routine. If you happen to have a
20 > section with 1MiB of the same byte, the pbzip2 front-end will split that into two blocks
21 > (at the default settings) and feed them to separate bzip2 compressors. bzip2 will
22 > then compress the first block down to a buffer of about 17kiB before passing it on
23 > to be compressed further, and the rest of the data would have fit within this block, if
24 > pbzip2 hadn't split it the way it had.
25 >
26 > As for decompression, pbzip2 can only really do parallel decompression of files that it
27 > created, since it seeks for the bz2 file header in order to split it to different workers. One
28 > reason for this is that the bz2 block header is not byte aligned.
29
30 I really don't know how to carry this discussion any further than
31 this; I'll answer any questions I can.
32
33 --
34 :wq

Replies