Gentoo Archives: gentoo-dev

From: Andrew Savchenko <bircoph@×××××.com>
To: gentoo-dev@l.g.o
Subject: Re: [gentoo-dev] RFC: using .xz for doc/man/info compression
Date: Tue, 13 May 2014 05:02:15
Message-Id: 20140513090142.4852d6311481b2c5e104dfcc@gmail.com
In Reply to: Re: [gentoo-dev] RFC: using .xz for doc/man/info compression by Alexander Tsoy
1 Hello,
2
3 On Mon, 12 May 2014 14:47:36 +0400 Alexander Tsoy wrote:
4 > В Sun, 11 May 2014 18:26:32 -0500
5 > Gordon Pettey <petteyg359@×××××.com> пишет:
6 >
7 > > A lot of small files (e.g. AUTHORS, ChangeLog
8 > >
9 > > FWIW: On my system, I have 59M of bz2 files in /usr/share/man and
10 > > /usr/share/doc. A short script to decompress those and recompress with xz
11 > > -6e reduced that to 36M.
12 >
13 > Very strange o_O
14 >
15 > Here is my test results. xz options: "--lzma2=preset=6e,dict=4MiB".
16 > Larger dictionary size does not improve compression ratio, I get
17 > even worse results with just "-6e" or "-9e". man-bz2 is a full copy of
18 > my /usr/share/man, man-xz is a recompressed one.
19 >
20 > Size comparison:
21 >
22 > $ du -s man-bz2/ man-xz/
23 > 82032 man-bz2/
24 > 82308 man-xz/
25
26 Please consider that by default du shows block size, not byte size.
27 Than means that if file is actually 1234 bytes large, without -b it
28 will be still accounted for 4096 bytes on 4K-block filesystem.
29
30 Here are my results:
31
32 1. With bzip2 -9:
33 find -O3 /usr/share/man -type f -name "*.bz2" -print0 | du -bhc --files0-from -
34 63M
35 find -O3 /usr/share/man -type f -name "*.bz2" -print0 | du -hc --files0-from -
36 146M
37
38 find -O3 /usr/share/doc -type f -name "*.bz2" -print0 | du -bhc --files0-from -
39 151M total
40 find -O3 /usr/share/doc -type f -name "*.bz2" -print0 | du -hc --files0-from -
41 249M total
42
43 2. With xz -9e:
44 find -O3 /usr/share/man -type f -name "*.xz" -print0 | du -bhc --files0-from -
45 64M
46 find -O3 /usr/share/man -type f -name "*.xz" -print0 | du -bhc --files0-from -
47 146M
48
49 find -O3 /usr/share/doc -type f -name "*.xz" -print0 | du -bhc --files0-from -
50 147M total
51 find -O3 /usr/share/doc -type f -name "*.xz" -print0 | du -hc --files0-from -
52 245M total
53
54 As one can see, on man pages xz is slightly worse or apparent file sizes
55 and has no difference on real disk usage. On docs xz is better for both sizes.
56
57 As for decompression speed, xz is about twice as good as bzip2 for a large man
58 pages (bash, mplayer, cmake, zshall). Though this speed gain needs to be
59 measured directly for bunzip2 and unxz applications. I'll publish statistically
60 meaningful results later. Both scripting and testing requires time.
61
62 Best regards,
63 Andrew Savchenko

Replies

Subject Author
Re: [gentoo-dev] RFC: using .xz for doc/man/info compression Ulrich Mueller <ulm@g.o>