Gentoo Archives: gentoo-dev

From: Martin Vaeth <martin@×××××.de>
To: gentoo-dev@l.g.o
Subject: [gentoo-dev] Re: Bug #565566: Why is it still not fixed?
Date: Fri, 26 Feb 2016 11:01:20
Message-Id: napb98$i61$1@ger.gmane.org
In Reply to: Re: [gentoo-dev] Re: Bug #565566: Why is it still not fixed? by Gordon Pettey
1 Gordon Pettey <petteyg359@×××××.com> wrote:
2 >>
3 >> Already now this means that you need 2 (or already 3?) times the
4 >> disk space as for an rysnc mirror; multiply all numbers by 4
5 >> if you used squashfs to store the tree. [...]
6 >
7 > Or, in 2-3 years, maybe people will stop with the hyperbole
8
9 Hyperbole? Really?
10
11 Let's first look at the current data.
12 Instead of guessing I now fetched the git tree
13 to get the exact number:
14
15 git on ext2 (8K blocks): 704 M
16 squashfs with lz4: 120 M
17
18 lz4 is the fastest algorithm, but not the best concerning space.
19 More seriously: The git data is still missing metadata information
20 which will add some more.
21
22 It seems my estimate of the factor 2*4 = 8
23 for the current state was rather realistic.
24
25 Not to forget that this was a fresh checkout where the .git
26 data itself is fully compressed in one file (which is by default
27 not the case when you update frequently - it depends on your
28 git configuration and perhaps whether you use a cron job for
29 recompression). So perhaps for some git users the bracket in
30 my estimate (3*4=12) is already correct.
31
32 Whether 1 GB of permanent disk space only for the
33 overhead of package management is appropriate, everybody
34 must decide by himself. Compared to other distributions,
35 this is an awful lot.
36 Only for getting ChangeLogs it is IMHO way too much.
37
38 And currently the git history is still almost empty...
39
40 Before I turn to the future, some remarks:
41
42 > The tree is a bunch of text files, of which a whole lot of text is
43 > repeated
44
45 That's why squashfs is so effective already compared to plain rsync.
46 Of course, a lot of the *current* factor comes from this.
47
48 > which is great for compression, which git does.
49
50 You seem to pretend that I ignored this, but I did not:
51
52 >> (there is possibility of some
53 >> compression of history, but OTOH, many packages are added and
54 >> removed, eclasses keep changing, etc.)
55
56 Of course, concerning future, one must make some assumptions.
57 Perhaps it is reasonable to assume that roughly a constant amount of
58 new data is added every year, i.e., the quotient (git data/squashfs)
59 increases every year by a constant summand.
60
61 Compression will not change this "constantness", but at most influence
62 the summand itself. Quite the opposite, in the moment when the history
63 evades a certain size - depending on the memory window size used by
64 the gzip implementation of git, compression will eventually
65 become much less effective: You can see the difference essentially
66 in the gzip vs. xz compresssion size, because the main difference
67 here is the size of the mentioned memory window.
68
69 And as mentioned above, unless you are regularly recompressing
70 (by a cron job or by git configuration after updating) you hardly
71 profit from the git compression at all.
72
73 How large the yearly summand is, can only be guessed, currently.
74 I think my assumption that after 1 year the number of new/modified files
75 is roughly the total amount of files in the tree is realistic, perhaps
76 even too low. (Not to forget that also every commit adds data by
77 itself.)
78
79 So in 2-3 years, the factor (compared to squashfs) might be roughly:
80 8*2.5 = 20 without recompressing .git
81 8 + 2.5 = 10.5 with fully compresssed .git
82 (The latter factor is unrealistically low, because git's gzip compression
83 is less effective than lz4 and *much* less effective than xz).
84
85 And even if I should have overestimated the yearly summand by the
86 factor 2, you only need to double the number of years which you
87 have to wait...

Replies

Subject Author
Re: [gentoo-dev] Re: Bug #565566: Why is it still not fixed? Rich Freeman <rich0@g.o>