Gentoo Archives: gentoo-dev

From: Gordon Pettey <petteyg359@×××××.com>
To: gentoo-dev@l.g.o
Subject: Re: [gentoo-dev] RFC: using .xz for doc/man/info compression
Date: Sun, 11 May 2014 23:26:39
Message-Id: CAHY5MefPzfDYsYf_yNUs5EbVwA5hwG-q76pV6radTjwwavrmvA@mail.gmail.com
In Reply to: Re: [gentoo-dev] RFC: using .xz for doc/man/info compression by Pacho Ramos
1 A lot of small files (e.g. AUTHORS, ChangeLog
2
3 FWIW: On my system, I have 59M of bz2 files in /usr/share/man and
4 /usr/share/doc. A short script to decompress those and recompress with xz
5 -6e reduced that to 36M. I don't have a comparison for individual file
6 differences.
7
8 I posted the short bash scripts at
9 https://gist.github.com/petteyg/96c71fa3c4680552f5c4
10
11
12
13 On Sun, May 11, 2014 at 4:27 PM, Pacho Ramos <pacho@g.o> wrote:
14
15 > El dom, 11-05-2014 a las 19:46 +0200, Michał Górny escribió:
16 > > Hello, developers.
17 > >
18 > > I'd like to raise the following item for discussion: making .xz
19 > > the default compressor used by portage for documentation, man pages
20 > > and info files. That is, the equivalent of:
21 > >
22 > > PORTAGE_COMPRESS=xz
23 > >
24 > > in make.globals.
25 > >
26 > > Rationale: xz-utils is quite widespread nowadays and it is a part
27 > > of @system set. It can achieve better compression ratio than bzip2,
28 > > and faster decompression at the same time.
29 > >
30 > > I have confirmed that both sys-apps/man and sys-apps/man-db can
31 > > handle .xz compressed man pages, and sys-apps/texinfo can handle .xz
32 > > compressed info pages. Major text editors and pagers support .xz
33 > > alike .bz2 (i.e. usually they support both or neither :)).
34 > >
35 > > The additional question is: what preset to use? To help discussing
36 > > this, I'd like to quote the tables from 'man xz':
37 > >
38 > > Preset DictSize CompCPU CompMem DecMem
39 > > -0 256 KiB 0 3 MiB 1 MiB
40 > > -1 1 MiB 1 9 MiB 2 MiB
41 > > -2 2 MiB 2 17 MiB 3 MiB
42 > > -3 4 MiB 3 32 MiB 5 MiB
43 > > -4 4 MiB 4 48 MiB 5 MiB
44 > > -5 8 MiB 5 94 MiB 9 MiB
45 > > -6 8 MiB 6 94 MiB 9 MiB
46 > > -7 16 MiB 6 186 MiB 17 MiB
47 > > -8 32 MiB 6 370 MiB 33 MiB
48 > > -9 64 MiB 6 674 MiB 65 MiB
49 > >
50 > > Preset DictSize CompCPU CompMem DecMem
51 > > -0e 256 KiB 8 4 MiB 1 MiB
52 > > -1e 1 MiB 8 13 MiB 2 MiB
53 > > -2e 2 MiB 8 25 MiB 3 MiB
54 > > -3e 4 MiB 7 48 MiB 5 MiB
55 > > -4e 4 MiB 8 48 MiB 5 MiB
56 > > -5e 8 MiB 7 94 MiB 9 MiB
57 > > -6e 8 MiB 8 94 MiB 9 MiB
58 > > -7e 16 MiB 8 186 MiB 17 MiB
59 > > -8e 32 MiB 8 370 MiB 33 MiB
60 > > -9e 64 MiB 8 674 MiB 65 MiB
61 > >
62 > > I'd like to note here that increasing dictionary size over file size
63 > > does not improve compression. However, the options involved in CompCPU
64 > > may.
65 > >
66 > > Depending on the expected amount of complexity, I'd either go for:
67 > >
68 > > 1) -6e (or -6, the default) -- max CompCPU, reasonable use of memory,
69 > > and dictionary larger than most (or all?) documents that are going to
70 > > be compressed,
71 > >
72 > > 2) -Ne with minimal 'N' for CompCPU==8 and DictSize > filesize -- still
73 > > max compression ratio while keeping lowest memory requirements possible.
74 > >
75 > > Your thoughts?
76 > >
77 >
78 > Per:
79 > https://bugs.gentoo.org/show_bug.cgi?id=372653
80 >
81 > Looks like bzip2 was still better for small files :/
82 >
83 >
84 >

Replies

Subject Author
Re: [gentoo-dev] RFC: using .xz for doc/man/info compression Alexander Tsoy <alexander@××××.me>