Gentoo Archives: gentoo-dev

From: "Marcin Mirosław" <marcin@×××××.pl>
To: gentoo-dev@l.g.o
Subject: Re: [gentoo-dev] RFC: using .xz for doc/man/info compression
Date: Mon, 12 May 2014 09:31:58
Message-Id: 53709501.3010402@mejor.pl
In Reply to: Re: [gentoo-dev] RFC: using .xz for doc/man/info compression by Pacho Ramos
1 W dniu 11.05.2014 23:27, Pacho Ramos pisze:
2 > El dom, 11-05-2014 a las 19:46 +0200, Michał Górny escribió:
3 >> Hello, developers.
4 >>
5 >> I'd like to raise the following item for discussion: making .xz
6 >> the default compressor used by portage for documentation, man pages
7 >> and info files. That is, the equivalent of:
8 >>
9 >> PORTAGE_COMPRESS=xz
10 >>
11 >> in make.globals.
12 >>
13 >> Rationale: xz-utils is quite widespread nowadays and it is a part
14 >> of @system set. It can achieve better compression ratio than bzip2,
15 >> and faster decompression at the same time.
16 >>
17 >> I have confirmed that both sys-apps/man and sys-apps/man-db can
18 >> handle .xz compressed man pages, and sys-apps/texinfo can handle .xz
19 >> compressed info pages. Major text editors and pagers support .xz
20 >> alike .bz2 (i.e. usually they support both or neither :)).
21 >>
22 >> The additional question is: what preset to use? To help discussing
23 >> this, I'd like to quote the tables from 'man xz':
24 >>
25 >> Preset DictSize CompCPU CompMem DecMem
26 >> -0 256 KiB 0 3 MiB 1 MiB
27 >> -1 1 MiB 1 9 MiB 2 MiB
28 >> -2 2 MiB 2 17 MiB 3 MiB
29 >> -3 4 MiB 3 32 MiB 5 MiB
30 >> -4 4 MiB 4 48 MiB 5 MiB
31 >> -5 8 MiB 5 94 MiB 9 MiB
32 >> -6 8 MiB 6 94 MiB 9 MiB
33 >> -7 16 MiB 6 186 MiB 17 MiB
34 >> -8 32 MiB 6 370 MiB 33 MiB
35 >> -9 64 MiB 6 674 MiB 65 MiB
36 >>
37 >> Preset DictSize CompCPU CompMem DecMem
38 >> -0e 256 KiB 8 4 MiB 1 MiB
39 >> -1e 1 MiB 8 13 MiB 2 MiB
40 >> -2e 2 MiB 8 25 MiB 3 MiB
41 >> -3e 4 MiB 7 48 MiB 5 MiB
42 >> -4e 4 MiB 8 48 MiB 5 MiB
43 >> -5e 8 MiB 7 94 MiB 9 MiB
44 >> -6e 8 MiB 8 94 MiB 9 MiB
45 >> -7e 16 MiB 8 186 MiB 17 MiB
46 >> -8e 32 MiB 8 370 MiB 33 MiB
47 >> -9e 64 MiB 8 674 MiB 65 MiB
48 >>
49 >> I'd like to note here that increasing dictionary size over file size
50 >> does not improve compression. However, the options involved in CompCPU
51 >> may.
52 >>
53 >> Depending on the expected amount of complexity, I'd either go for:
54 >>
55 >> 1) -6e (or -6, the default) -- max CompCPU, reasonable use of memory,
56 >> and dictionary larger than most (or all?) documents that are going to
57 >> be compressed,
58 >>
59 >> 2) -Ne with minimal 'N' for CompCPU==8 and DictSize > filesize -- still
60 >> max compression ratio while keeping lowest memory requirements possible.
61 >>
62 >> Your thoughts?
63 >>
64 >
65 > Per:
66 > https://bugs.gentoo.org/show_bug.cgi?id=372653
67 >
68 > Looks like bzip2 was still better for small files :/
69
70 Hi!
71 I did test on medium sized man file (bash):
72 $ man -a -w bash
73 /usr/share/man/man1/bash.1.bz2
74 $ stat --printf=%s\\n /usr/share/man/man1/bash.1.bz2
75 62606
76 $ time man -c -P /bin/cat bash >/dev/null
77
78 real 0m0.248s
79 user 0m0.316s
80 sys 0m0.012s
81 $ time man -c -P /bin/cat bash >/dev/null
82
83 real 0m0.252s
84 user 0m0.324s
85 sys 0m0.016s
86 $ time man -c -P /bin/cat bash >/dev/null
87
88 real 0m0.249s
89 user 0m0.320s
90 sys 0m0.012s
91
92 Now I recompress using xz -6 and next:
93 $ stat --printf=%s\\n /usr/share/man/man1/bash.1.xz
94 66628
95 $ time man -c -P /bin/cat bash >/dev/null
96
97 real 0m0.234s
98 user 0m0.304s
99 sys 0m0.004s
100 $ time man -c -P /bin/cat bash >/dev/null
101
102 real 0m0.244s
103 user 0m0.288s
104 sys 0m0.024s
105 $ time man -c -P /bin/cat bash >/dev/null
106
107 real 0m0.239s
108 user 0m0.308s
109 sys 0m0.012s
110
111 And with file compressed using '-6e':
112 $ stat --printf=%s\\n /usr/share/man/man1/bash.1.xz
113 66700
114 $ time man -c -P /bin/cat bash >/dev/null
115
116 real 0m0.233s
117 user 0m0.292s
118 sys 0m0.016s
119 $ time man -c -P /bin/cat bash >/dev/null
120
121 real 0m0.234s
122 user 0m0.300s
123 sys 0m0.008s
124
125 Imho there is no real advantages to change current compressor for man files.
126 Regards

Replies

Subject Author
Re: [gentoo-dev] RFC: using .xz for doc/man/info compression Tom Wijsman <TomWij@g.o>