1 |
A lot of small files (e.g. AUTHORS, ChangeLog |
2 |
|
3 |
FWIW: On my system, I have 59M of bz2 files in /usr/share/man and |
4 |
/usr/share/doc. A short script to decompress those and recompress with xz |
5 |
-6e reduced that to 36M. I don't have a comparison for individual file |
6 |
differences. |
7 |
|
8 |
I posted the short bash scripts at |
9 |
https://gist.github.com/petteyg/96c71fa3c4680552f5c4 |
10 |
|
11 |
|
12 |
|
13 |
On Sun, May 11, 2014 at 4:27 PM, Pacho Ramos <pacho@g.o> wrote: |
14 |
|
15 |
> El dom, 11-05-2014 a las 19:46 +0200, Michał Górny escribió: |
16 |
> > Hello, developers. |
17 |
> > |
18 |
> > I'd like to raise the following item for discussion: making .xz |
19 |
> > the default compressor used by portage for documentation, man pages |
20 |
> > and info files. That is, the equivalent of: |
21 |
> > |
22 |
> > PORTAGE_COMPRESS=xz |
23 |
> > |
24 |
> > in make.globals. |
25 |
> > |
26 |
> > Rationale: xz-utils is quite widespread nowadays and it is a part |
27 |
> > of @system set. It can achieve better compression ratio than bzip2, |
28 |
> > and faster decompression at the same time. |
29 |
> > |
30 |
> > I have confirmed that both sys-apps/man and sys-apps/man-db can |
31 |
> > handle .xz compressed man pages, and sys-apps/texinfo can handle .xz |
32 |
> > compressed info pages. Major text editors and pagers support .xz |
33 |
> > alike .bz2 (i.e. usually they support both or neither :)). |
34 |
> > |
35 |
> > The additional question is: what preset to use? To help discussing |
36 |
> > this, I'd like to quote the tables from 'man xz': |
37 |
> > |
38 |
> > Preset DictSize CompCPU CompMem DecMem |
39 |
> > -0 256 KiB 0 3 MiB 1 MiB |
40 |
> > -1 1 MiB 1 9 MiB 2 MiB |
41 |
> > -2 2 MiB 2 17 MiB 3 MiB |
42 |
> > -3 4 MiB 3 32 MiB 5 MiB |
43 |
> > -4 4 MiB 4 48 MiB 5 MiB |
44 |
> > -5 8 MiB 5 94 MiB 9 MiB |
45 |
> > -6 8 MiB 6 94 MiB 9 MiB |
46 |
> > -7 16 MiB 6 186 MiB 17 MiB |
47 |
> > -8 32 MiB 6 370 MiB 33 MiB |
48 |
> > -9 64 MiB 6 674 MiB 65 MiB |
49 |
> > |
50 |
> > Preset DictSize CompCPU CompMem DecMem |
51 |
> > -0e 256 KiB 8 4 MiB 1 MiB |
52 |
> > -1e 1 MiB 8 13 MiB 2 MiB |
53 |
> > -2e 2 MiB 8 25 MiB 3 MiB |
54 |
> > -3e 4 MiB 7 48 MiB 5 MiB |
55 |
> > -4e 4 MiB 8 48 MiB 5 MiB |
56 |
> > -5e 8 MiB 7 94 MiB 9 MiB |
57 |
> > -6e 8 MiB 8 94 MiB 9 MiB |
58 |
> > -7e 16 MiB 8 186 MiB 17 MiB |
59 |
> > -8e 32 MiB 8 370 MiB 33 MiB |
60 |
> > -9e 64 MiB 8 674 MiB 65 MiB |
61 |
> > |
62 |
> > I'd like to note here that increasing dictionary size over file size |
63 |
> > does not improve compression. However, the options involved in CompCPU |
64 |
> > may. |
65 |
> > |
66 |
> > Depending on the expected amount of complexity, I'd either go for: |
67 |
> > |
68 |
> > 1) -6e (or -6, the default) -- max CompCPU, reasonable use of memory, |
69 |
> > and dictionary larger than most (or all?) documents that are going to |
70 |
> > be compressed, |
71 |
> > |
72 |
> > 2) -Ne with minimal 'N' for CompCPU==8 and DictSize > filesize -- still |
73 |
> > max compression ratio while keeping lowest memory requirements possible. |
74 |
> > |
75 |
> > Your thoughts? |
76 |
> > |
77 |
> |
78 |
> Per: |
79 |
> https://bugs.gentoo.org/show_bug.cgi?id=372653 |
80 |
> |
81 |
> Looks like bzip2 was still better for small files :/ |
82 |
> |
83 |
> |
84 |
> |