Gentoo Archives: gentoo-dev

From: Rich Freeman <rich0@g.o>
To: gentoo-dev <gentoo-dev@l.g.o>
Subject: Re: [gentoo-dev] Re: Bug #565566: Why is it still not fixed?
Date: Fri, 26 Feb 2016 13:37:06
Message-Id: CAGfcS_mxe=Jct2oMdCKTJPznKku5z-8Ts3XmX3GHzfdKGed7Eg@mail.gmail.com
In Reply to: [gentoo-dev] Re: Bug #565566: Why is it still not fixed? by Martin Vaeth
1 On Fri, Feb 26, 2016 at 7:59 AM, Martin Vaeth <martin@×××××.de> wrote:
2 > Rich Freeman <rich0@g.o> wrote:
3 >>>
4 >>> And currently the git history is still almost empty...
5 >>>
6 >>
7 >> If you want pre-migration history you need to fetch that separately.
8 >
9 > How? Neither on gitweb.gentoo.org nor on github I found an obvious
10 > repository with this data.
11
12 https://wiki.gentoo.org/wiki/Gentoo_git_workflow#Grafting_Gentoo_History_Onto_the_Active_Repo
13
14 If you're interested in history it is easy to do, and the repo on
15 github works fine for web access or the various github stats/etc.
16 Well, sort-of - I get the impression that github doesn't host a lot of
17 repos with that much history and when you push that repo to github for
18 the first time it will timeout and die and the repo will appear on the
19 site 30-60min later (I imagine subsequent pushes would be fine). I
20 think we actually have one of the largest git repos out there in terms
21 of number of objects. At least, when I was keeping tabs on other
22 migration efforts there weren't many that came close (including some
23 projects that you'd think of as having a lot of history). The fact
24 that every package revision+patch+etc is a file in Gentoo is a big
25 part of that.
26
27 >
28 >> It is about 1.7G.
29 >> Considering that this represents a LOT more than 2-3 years of history
30 >
31 > If the 1.7G are fully compressed history, this would confirm
32 > my estimate rather precisely, if it represents (1700/120 - 1) ~ 13 years.
33
34 Perhaps I misread your post then. I saw lots of numbers but not many
35 units, and I probably didn't follow what you intended to say.
36
37 >
38 > Note that I compared squashfs with a git user who does not even
39 > care about git-internal recompression. Of course, you can decrease
40 > the factor somewhat if e.g. your checked-out tree is still stored
41 > on squashfs. This does not change the fact that the factor will
42 > increase every year by about 1 (or probably more, because git
43 > uses the uneffective gzip compression, only).
44 >
45
46 A checkout of gentoo-x86 is about 590M. If you use the repo that
47 includes cache/etc it expands to 1.2G. 13 years of history is 1.7G.
48 Clearly it doesn't increase by a factor of 1 every year, unless again
49 I'm misunderstanding what you're intending to communicate.
50
51 A git checkout consists of two parts. It has the .git directory which
52 contains all the data, and it consists of the working tree. In the
53 case of gentoo-x86 the working tree is about 440MB and the history is
54 about 150M.
55
56 The working tree doesn't really change in size much - it just reflects
57 the size of the current revision of the tree. It is also not
58 compressed (unless you stick the whole thing in a squashfs, which you
59 could certainly do). It is the history which continuously grows.
60 However, the history IS compressed and the reality is that most new
61 ebuilds are similar to ebuilds that are already in the history, so it
62 compresses very well. Of course it would be nice if you could use
63 something other than gzip to compress it.
64
65 There is no reason that somebody couldn't distribute squashfs versions
66 of a git /usr/portage, but if you want the full history it would still
67 be around 1.7G. It would still be smaller than a checked-out tree
68 (the 1.7G figure is just history - it doesn't include the extra 440MB
69 or so for the checkout).
70
71 My point wasn't so much that there aren't sized benefits to squashfs
72 and no history. I'm just saying that git is pretty efficient for what
73 it does do.
74
75 --
76 Rich

Replies

Subject Author
[gentoo-dev] Re: Bug #565566: Why is it still not fixed? Martin Vaeth <martin@×××××.de>