Gentoo Archives: gentoo-user

From:	Rich Freeman <rich0@g.o>
To:	gentoo-user@l.g.o
Subject:	Re: [gentoo-user] OT Best way to compress files with digits
Date:	Fri, 31 Oct 2014 19:23:26
Message-Id:	`CAGfcS_=eh-EaK8s5XBgdxYFp3yyfoA0fo-NSEOPJV2ec_=mECg@mail.gmail.com`
In Reply to:	Re: [gentoo-user] OT Best way to compress files with digits by David Haller

1	On Fri, Oct 31, 2014 at 2:55 PM, David Haller <gentoo@×××××××.de> wrote:
2	>
3	> On Fri, 31 Oct 2014, Rich Freeman wrote:
4	>
5	>>I can't imagine that any tool will do much better than something like
6	>>lzo, gzip, xz, etc. You'll definitely benefit from compression though
7	>>- your text files full of digits are encoding 3.3 bits of information
8	>>in an 8-bit ascii character and even if the order of digits in pi can
9	>>be treated as purely random just about any compression algorithm is
10	>>going to get pretty close to that 3.3 bits per digit figure.
11	>
12	> Good estimate:
13	>
14	> $ calc '101000/(8/3.3)'
15	> 41662.5
16	> and I get from (lzip)
17	> $ calc 44543*8/101000
18	> 3.528... (bits/digit)
19	> to zip:
20	> $ calc 49696*8/101000
21	> ~3.93 (bits/digit)
22
23	Actually, I'm surprised how far off of this the various methods are.
24	I was expecting SOME overhead, but not this much.
25
26	A fairly quick algorithm would be to encode every possible set of 96
27	digits into a 40 byte code (that is just a straight decimal-binary
28	conversion). Then read a "word" at a time and translate it. This
29	will only waste 0.011 bits per digit.
30
31	--
32	Rich

Subject	Author
[gentoo-user] Re: OT Best way to compress files with digits	Grant Edwards <grant.b.edwards@×××××.com>