Gentoo Archives: gentoo-dev

From: Martin Vaeth <martin@×××××.de>
To: gentoo-dev@l.g.o
Subject: [gentoo-dev] Re: Bug #565566: Why is it still not fixed?
Date: Sat, 27 Feb 2016 10:31:07
Message-Id: nartsl$p7f$1@ger.gmane.org
In Reply to: Re: [gentoo-dev] Re: Bug #565566: Why is it still not fixed? by Rich Freeman
1 Rich Freeman <rich0@g.o> wrote:
2 >
3 > Clearly it doesn't increase by a factor of 1 every year
4
5 The yearly increase of the factor is rather precisely 1:
6 According to current data, it is .95, see below.
7 With xz compression for squashfs, it is even 1.4!
8
9 (Note: increase _of_ the factor, not _by_ the factor, of course;
10 we are speaking about a linear increase, not an exponential one.)
11
12 More precisely: If in both cases you extremeley optimize for space
13 (details see below) then a change from rsync to git (non-shallow)
14 costs you
15
16 a) now: the factor 2.6 of needed disk space
17
18 b) in future for every year this factor is increased
19 by the summand 1.4. For example, in 2.5 years you will need roughly
20 2.6 + (1.4 * 2.5) = 6.1 times the disk space than for rsync.
21 After 2.5 more years, the factor will be more than 10.
22
23 For a) I assumed that in both cases the current repository is kept
24 compressed with squashfs (xz). This first factor will be much
25 larger, of course, if you omit squashfs when you switch to git.
26 (You must take measurements to keep the checked-out repository separate:
27 you cannot use standard emerge --sync to get this optimization.)
28
29 For both numbers, I even optimized the .git compression by
30 executing repeatedly
31 git prune; git repack -a -d; git gc --agressive
32 which for the historical repository took several hours;
33 thus, unless you use a cron-job, this is not realistic.
34 Without this optimization, both numbers would be even larger.
35
36 Here are the plain data I used for the calculation:
37
38 1. RSYNC = 84,062,208
39 (rsync gentoo repository, compressed with squashfs (-comp xz).)
40
41 2. GIT = 136,322,616
42 (Current .git data, without checked-out tree;
43 compression optimized by the time-costly commands above.)
44
45 3. FULL = 1,923,685,435
46 (.git data as in 2, but with history added)
47
48 4. YEARS = 15
49 (length of the historical data: first checkin was June 2000;
50 change to git was IIRC somewhere in middle 2015).
51
52 So the number from a) is
53
54 size with git $GIT + $RSYNC
55 --------------- = ------------- ~ 2.6
56 size with rysnc $RSYNC
57
58 The number from b) is
59
60 size of history increase per year ($FULL - $GIT) / $YEARS
61 --------------------------------- = ------------------------ ~ 1.4
62 size with rsync $RSYNC
63
64 In the previus postings, I was assuming the much faster squashfs
65 compression -comp lz4 -Xhc instead of -comp xz. In this case,
66 the number from 1 changes to
67
68 RSYNC = 125784064
69
70 which leads to the factor .95 ~ 1 for b) which I mentioned in the
71 beginning.