1 |
On Fri, Oct 31, 2014 at 2:55 PM, David Haller <gentoo@×××××××.de> wrote: |
2 |
> |
3 |
> On Fri, 31 Oct 2014, Rich Freeman wrote: |
4 |
> |
5 |
>>I can't imagine that any tool will do much better than something like |
6 |
>>lzo, gzip, xz, etc. You'll definitely benefit from compression though |
7 |
>>- your text files full of digits are encoding 3.3 bits of information |
8 |
>>in an 8-bit ascii character and even if the order of digits in pi can |
9 |
>>be treated as purely random just about any compression algorithm is |
10 |
>>going to get pretty close to that 3.3 bits per digit figure. |
11 |
> |
12 |
> Good estimate: |
13 |
> |
14 |
> $ calc '101000/(8/3.3)' |
15 |
> 41662.5 |
16 |
> and I get from (lzip) |
17 |
> $ calc 44543*8/101000 |
18 |
> 3.528... (bits/digit) |
19 |
> to zip: |
20 |
> $ calc 49696*8/101000 |
21 |
> ~3.93 (bits/digit) |
22 |
|
23 |
Actually, I'm surprised how far off of this the various methods are. |
24 |
I was expecting SOME overhead, but not this much. |
25 |
|
26 |
A fairly quick algorithm would be to encode every possible set of 96 |
27 |
digits into a 40 byte code (that is just a straight decimal-binary |
28 |
conversion). Then read a "word" at a time and translate it. This |
29 |
will only waste 0.011 bits per digit. |
30 |
|
31 |
-- |
32 |
Rich |