1 |
Gordon Pettey <petteyg359@×××××.com> wrote: |
2 |
>> |
3 |
>> Already now this means that you need 2 (or already 3?) times the |
4 |
>> disk space as for an rysnc mirror; multiply all numbers by 4 |
5 |
>> if you used squashfs to store the tree. [...] |
6 |
> |
7 |
> Or, in 2-3 years, maybe people will stop with the hyperbole |
8 |
|
9 |
Hyperbole? Really? |
10 |
|
11 |
Let's first look at the current data. |
12 |
Instead of guessing I now fetched the git tree |
13 |
to get the exact number: |
14 |
|
15 |
git on ext2 (8K blocks): 704 M |
16 |
squashfs with lz4: 120 M |
17 |
|
18 |
lz4 is the fastest algorithm, but not the best concerning space. |
19 |
More seriously: The git data is still missing metadata information |
20 |
which will add some more. |
21 |
|
22 |
It seems my estimate of the factor 2*4 = 8 |
23 |
for the current state was rather realistic. |
24 |
|
25 |
Not to forget that this was a fresh checkout where the .git |
26 |
data itself is fully compressed in one file (which is by default |
27 |
not the case when you update frequently - it depends on your |
28 |
git configuration and perhaps whether you use a cron job for |
29 |
recompression). So perhaps for some git users the bracket in |
30 |
my estimate (3*4=12) is already correct. |
31 |
|
32 |
Whether 1 GB of permanent disk space only for the |
33 |
overhead of package management is appropriate, everybody |
34 |
must decide by himself. Compared to other distributions, |
35 |
this is an awful lot. |
36 |
Only for getting ChangeLogs it is IMHO way too much. |
37 |
|
38 |
And currently the git history is still almost empty... |
39 |
|
40 |
Before I turn to the future, some remarks: |
41 |
|
42 |
> The tree is a bunch of text files, of which a whole lot of text is |
43 |
> repeated |
44 |
|
45 |
That's why squashfs is so effective already compared to plain rsync. |
46 |
Of course, a lot of the *current* factor comes from this. |
47 |
|
48 |
> which is great for compression, which git does. |
49 |
|
50 |
You seem to pretend that I ignored this, but I did not: |
51 |
|
52 |
>> (there is possibility of some |
53 |
>> compression of history, but OTOH, many packages are added and |
54 |
>> removed, eclasses keep changing, etc.) |
55 |
|
56 |
Of course, concerning future, one must make some assumptions. |
57 |
Perhaps it is reasonable to assume that roughly a constant amount of |
58 |
new data is added every year, i.e., the quotient (git data/squashfs) |
59 |
increases every year by a constant summand. |
60 |
|
61 |
Compression will not change this "constantness", but at most influence |
62 |
the summand itself. Quite the opposite, in the moment when the history |
63 |
evades a certain size - depending on the memory window size used by |
64 |
the gzip implementation of git, compression will eventually |
65 |
become much less effective: You can see the difference essentially |
66 |
in the gzip vs. xz compresssion size, because the main difference |
67 |
here is the size of the mentioned memory window. |
68 |
|
69 |
And as mentioned above, unless you are regularly recompressing |
70 |
(by a cron job or by git configuration after updating) you hardly |
71 |
profit from the git compression at all. |
72 |
|
73 |
How large the yearly summand is, can only be guessed, currently. |
74 |
I think my assumption that after 1 year the number of new/modified files |
75 |
is roughly the total amount of files in the tree is realistic, perhaps |
76 |
even too low. (Not to forget that also every commit adds data by |
77 |
itself.) |
78 |
|
79 |
So in 2-3 years, the factor (compared to squashfs) might be roughly: |
80 |
8*2.5 = 20 without recompressing .git |
81 |
8 + 2.5 = 10.5 with fully compresssed .git |
82 |
(The latter factor is unrealistically low, because git's gzip compression |
83 |
is less effective than lz4 and *much* less effective than xz). |
84 |
|
85 |
And even if I should have overestimated the yearly summand by the |
86 |
factor 2, you only need to double the number of years which you |
87 |
have to wait... |