1 |
On Fri, Feb 26, 2016 at 7:59 AM, Martin Vaeth <martin@×××××.de> wrote: |
2 |
> Rich Freeman <rich0@g.o> wrote: |
3 |
>>> |
4 |
>>> And currently the git history is still almost empty... |
5 |
>>> |
6 |
>> |
7 |
>> If you want pre-migration history you need to fetch that separately. |
8 |
> |
9 |
> How? Neither on gitweb.gentoo.org nor on github I found an obvious |
10 |
> repository with this data. |
11 |
|
12 |
https://wiki.gentoo.org/wiki/Gentoo_git_workflow#Grafting_Gentoo_History_Onto_the_Active_Repo |
13 |
|
14 |
If you're interested in history it is easy to do, and the repo on |
15 |
github works fine for web access or the various github stats/etc. |
16 |
Well, sort-of - I get the impression that github doesn't host a lot of |
17 |
repos with that much history and when you push that repo to github for |
18 |
the first time it will timeout and die and the repo will appear on the |
19 |
site 30-60min later (I imagine subsequent pushes would be fine). I |
20 |
think we actually have one of the largest git repos out there in terms |
21 |
of number of objects. At least, when I was keeping tabs on other |
22 |
migration efforts there weren't many that came close (including some |
23 |
projects that you'd think of as having a lot of history). The fact |
24 |
that every package revision+patch+etc is a file in Gentoo is a big |
25 |
part of that. |
26 |
|
27 |
> |
28 |
>> It is about 1.7G. |
29 |
>> Considering that this represents a LOT more than 2-3 years of history |
30 |
> |
31 |
> If the 1.7G are fully compressed history, this would confirm |
32 |
> my estimate rather precisely, if it represents (1700/120 - 1) ~ 13 years. |
33 |
|
34 |
Perhaps I misread your post then. I saw lots of numbers but not many |
35 |
units, and I probably didn't follow what you intended to say. |
36 |
|
37 |
> |
38 |
> Note that I compared squashfs with a git user who does not even |
39 |
> care about git-internal recompression. Of course, you can decrease |
40 |
> the factor somewhat if e.g. your checked-out tree is still stored |
41 |
> on squashfs. This does not change the fact that the factor will |
42 |
> increase every year by about 1 (or probably more, because git |
43 |
> uses the uneffective gzip compression, only). |
44 |
> |
45 |
|
46 |
A checkout of gentoo-x86 is about 590M. If you use the repo that |
47 |
includes cache/etc it expands to 1.2G. 13 years of history is 1.7G. |
48 |
Clearly it doesn't increase by a factor of 1 every year, unless again |
49 |
I'm misunderstanding what you're intending to communicate. |
50 |
|
51 |
A git checkout consists of two parts. It has the .git directory which |
52 |
contains all the data, and it consists of the working tree. In the |
53 |
case of gentoo-x86 the working tree is about 440MB and the history is |
54 |
about 150M. |
55 |
|
56 |
The working tree doesn't really change in size much - it just reflects |
57 |
the size of the current revision of the tree. It is also not |
58 |
compressed (unless you stick the whole thing in a squashfs, which you |
59 |
could certainly do). It is the history which continuously grows. |
60 |
However, the history IS compressed and the reality is that most new |
61 |
ebuilds are similar to ebuilds that are already in the history, so it |
62 |
compresses very well. Of course it would be nice if you could use |
63 |
something other than gzip to compress it. |
64 |
|
65 |
There is no reason that somebody couldn't distribute squashfs versions |
66 |
of a git /usr/portage, but if you want the full history it would still |
67 |
be around 1.7G. It would still be smaller than a checked-out tree |
68 |
(the 1.7G figure is just history - it doesn't include the extra 440MB |
69 |
or so for the checkout). |
70 |
|
71 |
My point wasn't so much that there aren't sized benefits to squashfs |
72 |
and no history. I'm just saying that git is pretty efficient for what |
73 |
it does do. |
74 |
|
75 |
-- |
76 |
Rich |