Gentoo Archives: gentoo-user

From: Rich Freeman <rich0@g.o>
To: gentoo-user@l.g.o
Subject: Re: [gentoo-user] OT Best way to compress files with digits
Date: Fri, 31 Oct 2014 19:23:26
Message-Id: CAGfcS_=eh-EaK8s5XBgdxYFp3yyfoA0fo-NSEOPJV2ec_=mECg@mail.gmail.com
In Reply to: Re: [gentoo-user] OT Best way to compress files with digits by David Haller
1 On Fri, Oct 31, 2014 at 2:55 PM, David Haller <gentoo@×××××××.de> wrote:
2 >
3 > On Fri, 31 Oct 2014, Rich Freeman wrote:
4 >
5 >>I can't imagine that any tool will do much better than something like
6 >>lzo, gzip, xz, etc. You'll definitely benefit from compression though
7 >>- your text files full of digits are encoding 3.3 bits of information
8 >>in an 8-bit ascii character and even if the order of digits in pi can
9 >>be treated as purely random just about any compression algorithm is
10 >>going to get pretty close to that 3.3 bits per digit figure.
11 >
12 > Good estimate:
13 >
14 > $ calc '101000/(8/3.3)'
15 > 41662.5
16 > and I get from (lzip)
17 > $ calc 44543*8/101000
18 > 3.528... (bits/digit)
19 > to zip:
20 > $ calc 49696*8/101000
21 > ~3.93 (bits/digit)
22
23 Actually, I'm surprised how far off of this the various methods are.
24 I was expecting SOME overhead, but not this much.
25
26 A fairly quick algorithm would be to encode every possible set of 96
27 digits into a 40 byte code (that is just a straight decimal-binary
28 conversion). Then read a "word" at a time and translate it. This
29 will only waste 0.011 bits per digit.
30
31 --
32 Rich

Replies

Subject Author
[gentoo-user] Re: OT Best way to compress files with digits Grant Edwards <grant.b.edwards@×××××.com>