Gentoo Archives: gentoo-dev

From:	Martin Vaeth <martin@×××××.de>
To:	gentoo-dev@l.g.o
Subject:	[gentoo-dev] Re: Bug #565566: Why is it still not fixed?
Date:	Fri, 26 Feb 2016 11:01:20
Message-Id:	`napb98$i61$1@ger.gmane.org`
In Reply to:	Re: [gentoo-dev] Re: Bug #565566: Why is it still not fixed? by Gordon Pettey

1	Gordon Pettey <petteyg359@×××××.com> wrote:
2	>>
3	>> Already now this means that you need 2 (or already 3?) times the
4	>> disk space as for an rysnc mirror; multiply all numbers by 4
5	>> if you used squashfs to store the tree. [...]
6	>
7	> Or, in 2-3 years, maybe people will stop with the hyperbole
8
9	Hyperbole? Really?
10
11	Let's first look at the current data.
12	Instead of guessing I now fetched the git tree
13	to get the exact number:
14
15	git on ext2 (8K blocks): 704 M
16	squashfs with lz4: 120 M
17
18	lz4 is the fastest algorithm, but not the best concerning space.
19	More seriously: The git data is still missing metadata information
20	which will add some more.
21
22	It seems my estimate of the factor 2*4 = 8
23	for the current state was rather realistic.
24
25	Not to forget that this was a fresh checkout where the .git
26	data itself is fully compressed in one file (which is by default
27	not the case when you update frequently - it depends on your
28	git configuration and perhaps whether you use a cron job for
29	recompression). So perhaps for some git users the bracket in
30	my estimate (3*4=12) is already correct.
31
32	Whether 1 GB of permanent disk space only for the
33	overhead of package management is appropriate, everybody
34	must decide by himself. Compared to other distributions,
35	this is an awful lot.
36	Only for getting ChangeLogs it is IMHO way too much.
37
38	And currently the git history is still almost empty...
39
40	Before I turn to the future, some remarks:
41
42	> The tree is a bunch of text files, of which a whole lot of text is
43	> repeated
44
45	That's why squashfs is so effective already compared to plain rsync.
46	Of course, a lot of the current factor comes from this.
47
48	> which is great for compression, which git does.
49
50	You seem to pretend that I ignored this, but I did not:
51
52	>> (there is possibility of some
53	>> compression of history, but OTOH, many packages are added and
54	>> removed, eclasses keep changing, etc.)
55
56	Of course, concerning future, one must make some assumptions.
57	Perhaps it is reasonable to assume that roughly a constant amount of
58	new data is added every year, i.e., the quotient (git data/squashfs)
59	increases every year by a constant summand.
60
61	Compression will not change this "constantness", but at most influence
62	the summand itself. Quite the opposite, in the moment when the history
63	evades a certain size - depending on the memory window size used by
64	the gzip implementation of git, compression will eventually
65	become much less effective: You can see the difference essentially
66	in the gzip vs. xz compresssion size, because the main difference
67	here is the size of the mentioned memory window.
68
69	And as mentioned above, unless you are regularly recompressing
70	(by a cron job or by git configuration after updating) you hardly
71	profit from the git compression at all.
72
73	How large the yearly summand is, can only be guessed, currently.
74	I think my assumption that after 1 year the number of new/modified files
75	is roughly the total amount of files in the tree is realistic, perhaps
76	even too low. (Not to forget that also every commit adds data by
77	itself.)
78
79	So in 2-3 years, the factor (compared to squashfs) might be roughly:
80	8*2.5 = 20 without recompressing .git
81	8 + 2.5 = 10.5 with fully compresssed .git
82	(The latter factor is unrealistically low, because git's gzip compression
83	is less effective than lz4 and much less effective than xz).
84
85	And even if I should have overestimated the yearly summand by the
86	factor 2, you only need to double the number of years which you
87	have to wait...

Replies

Subject	Author
Re: [gentoo-dev] Re: Bug #565566: Why is it still not fixed?	Rich Freeman <rich0@g.o>

Report Message

Find on MARC Find on Google Groups