On 20141031, Rich Freeman <rich0@g.o> wrote: 
> On Fri, Oct 31, 2014 at 2:55 PM, David Haller <gentoo@×××××××.de> wrote: 
>> 
>> On Fri, 31 Oct 2014, Rich Freeman wrote: 
>> 
>>>I can't imagine that any tool will do much better than something like 
>>>lzo, gzip, xz, etc. You'll definitely benefit from compression though 
>>> your text files full of digits are encoding 3.3 bits of information 
>>>in an 8bit ascii character and even if the order of digits in pi can 
>>>be treated as purely random just about any compression algorithm is 
>>>going to get pretty close to that 3.3 bits per digit figure. 
>> 
>> Good estimate: 
>> 
>> $ calc '101000/(8/3.3)' 
>> 41662.5 
>> and I get from (lzip) 
>> $ calc 44543*8/101000 
>> 3.528... (bits/digit) 
>> to zip: 
>> $ calc 49696*8/101000 
>> ~3.93 (bits/digit) 
> 
> Actually, I'm surprised how far off of this the various methods are. 
> I was expecting SOME overhead, but not this much. 
> 
> A fairly quick algorithm would be to encode every possible set of 96 
> digits into a 40 byte code (that is just a straight decimalbinary 
> conversion). Then read a "word" at a time and translate it. This 
> will only waste 0.011 bits per digit. 
You're cheating. The algorithm you tested will compress strings of 
arbitrary 8bit values. The algorithm you proposed will only compress 
strings of bytes where each byte can have only one of 10 values. 
Grant Edwards grant.b.edwards 
at REWRITE on my CEASAR 
gmail.com 