1 |
On 2014-10-31, Rich Freeman <rich0@g.o> wrote: |
2 |
> On Fri, Oct 31, 2014 at 2:55 PM, David Haller <gentoo@×××××××.de> wrote: |
3 |
>> |
4 |
>> On Fri, 31 Oct 2014, Rich Freeman wrote: |
5 |
>> |
6 |
>>>I can't imagine that any tool will do much better than something like |
7 |
>>>lzo, gzip, xz, etc. You'll definitely benefit from compression though |
8 |
>>>- your text files full of digits are encoding 3.3 bits of information |
9 |
>>>in an 8-bit ascii character and even if the order of digits in pi can |
10 |
>>>be treated as purely random just about any compression algorithm is |
11 |
>>>going to get pretty close to that 3.3 bits per digit figure. |
12 |
>> |
13 |
>> Good estimate: |
14 |
>> |
15 |
>> $ calc '101000/(8/3.3)' |
16 |
>> 41662.5 |
17 |
>> and I get from (lzip) |
18 |
>> $ calc 44543*8/101000 |
19 |
>> 3.528... (bits/digit) |
20 |
>> to zip: |
21 |
>> $ calc 49696*8/101000 |
22 |
>> ~3.93 (bits/digit) |
23 |
> |
24 |
> Actually, I'm surprised how far off of this the various methods are. |
25 |
> I was expecting SOME overhead, but not this much. |
26 |
> |
27 |
> A fairly quick algorithm would be to encode every possible set of 96 |
28 |
> digits into a 40 byte code (that is just a straight decimal-binary |
29 |
> conversion). Then read a "word" at a time and translate it. This |
30 |
> will only waste 0.011 bits per digit. |
31 |
|
32 |
You're cheating. The algorithm you tested will compress strings of |
33 |
arbitrary 8-bit values. The algorithm you proposed will only compress |
34 |
strings of bytes where each byte can have only one of 10 values. |
35 |
|
36 |
-- |
37 |
Grant Edwards grant.b.edwards Yow! I want another |
38 |
at RE-WRITE on my CEASAR |
39 |
gmail.com SALAD!! |