Gentoo Archives: gentoo-user

From: Grant Edwards <grant.b.edwards@×××××.com>
To: gentoo-user@l.g.o
Subject: [gentoo-user] Re: OT Best way to compress files with digits
Date: Fri, 31 Oct 2014 20:26:04
Message-Id: m30r7s$ee5$1@ger.gmane.org
In Reply to: Re: [gentoo-user] OT Best way to compress files with digits by Rich Freeman
1 On 2014-10-31, Rich Freeman <rich0@g.o> wrote:
2 > On Fri, Oct 31, 2014 at 2:55 PM, David Haller <gentoo@×××××××.de> wrote:
3 >>
4 >> On Fri, 31 Oct 2014, Rich Freeman wrote:
5 >>
6 >>>I can't imagine that any tool will do much better than something like
7 >>>lzo, gzip, xz, etc. You'll definitely benefit from compression though
8 >>>- your text files full of digits are encoding 3.3 bits of information
9 >>>in an 8-bit ascii character and even if the order of digits in pi can
10 >>>be treated as purely random just about any compression algorithm is
11 >>>going to get pretty close to that 3.3 bits per digit figure.
12 >>
13 >> Good estimate:
14 >>
15 >> $ calc '101000/(8/3.3)'
16 >> 41662.5
17 >> and I get from (lzip)
18 >> $ calc 44543*8/101000
19 >> 3.528... (bits/digit)
20 >> to zip:
21 >> $ calc 49696*8/101000
22 >> ~3.93 (bits/digit)
23 >
24 > Actually, I'm surprised how far off of this the various methods are.
25 > I was expecting SOME overhead, but not this much.
26 >
27 > A fairly quick algorithm would be to encode every possible set of 96
28 > digits into a 40 byte code (that is just a straight decimal-binary
29 > conversion). Then read a "word" at a time and translate it. This
30 > will only waste 0.011 bits per digit.
31
32 You're cheating. The algorithm you tested will compress strings of
33 arbitrary 8-bit values. The algorithm you proposed will only compress
34 strings of bytes where each byte can have only one of 10 values.
35
36 --
37 Grant Edwards grant.b.edwards Yow! I want another
38 at RE-WRITE on my CEASAR
39 gmail.com SALAD!!

Replies

Subject Author
Re: [gentoo-user] Re: OT Best way to compress files with digits Rich Freeman <rich0@g.o>