Gentoo Archives: gentoo-dev

From: Duncan <1i5t5.duncan@×××.net>
To: gentoo-dev@l.g.o
Subject: [gentoo-dev] Re: RFC: split up media-sound/ category
Date: Sun, 26 Jun 2011 14:11:52
Message-Id: pan.2011.06.26.14.10.49@cox.net
In Reply to: Re: [gentoo-dev] Re: RFC: split up media-sound/ category by Kent Fredric
1 Kent Fredric posted on Sun, 26 Jun 2011 17:43:27 +1200 as excerpted:
2
3 > On 26 June 2011 15:49, Wyatt Epp <wyatt.epp@×××××.com> wrote:
4 >> As for the latter part, the size of a git repo becoming umanageable
5 >> over time had not occurred to me, I'm afraid-- would it work to use
6 >> shallow clones?  Otherwise, the herd-wise division is probably
7 >> acceptable.  Need to think about that one more.
8 >
9 >
10 > --depth <depth>
11 > Create a shallow clone with a history truncated to the
12 > specified number of revisions. A shallow repository has a
13 > number of limitations (you cannot clone or fetch from it, nor
14 > push from nor into it), but is adequate if you are only
15 > interested in the recent history of a large project with a
16 > long history, and would want to send in fixes as patches.
17 >
18 > It would be ok perhaps for non-contributing users to use shallow clones,
19 > but in my understanding, shallow clones limit you to doing what you
20 > could do with a tar file of the specified revision, which basically
21 > makes it impractical for people who are developing on it,
22 > and would mean every new developer would get a progressively longer time
23 > in order to do a complete check out.
24
25 Not substantially so, no.
26
27 FWIW, git scales VERY well in this regard, provided it's used for text-
28 based content (sources) as originally intended. (It's not so hot at
29 binary blob management, but it's not designed for that. Fortunately,
30 gentoo's usage would be nearly 100% text-based.)
31
32 What git does over time is compress the diffs into a series of packages
33 (tarballs or whatever, I don't know the internals), and text compresses
34 REALLY well. Then new checkouts grab the compressed packages, with only
35 the last little bit being uncompressed. Existing users can run garbage-
36 collection periodically to collect and compress their existing history
37 into the packages as well.
38
39 So for example, du says my kernel git tree totals 1.6 GB, including the
40 active checkout and two separate (dirty) build trees. The bare git tree
41 (history repo without working tree) itself is 891 MB. So the bare repo
42 is only 54% of the total, and I've not actually garbage-collected in some
43 time. If I had, the ratio would be closer to 50%, meaning the entire
44 kernel git history repo compresses to roughly the size of the working
45 tree, and only roughly doubles the size of a single decompressed working
46 tarball.
47
48 Over time that'll certainly grow a bit, but it really does scale well.
49 The kernel has been in git for enough time now that there's quite some
50 history built up, and that it only roughly doubles the size of a single
51 decompressed working tree snapshot, while making available at my
52 fingertips the entire history since original checkin, is impressive
53 indeed.
54
55 It's all down to how well the sources and diffs compress. If there were
56 significant binary blobs in there (the kernel tree does have a few bits
57 of firmware, the tux logo, etc), it would compress far less effectively.
58 But gentoo's tree is pretty much all text as well, fortunately. =:^)
59
60 --
61 Duncan - List replies preferred. No HTML msgs.
62 "Every nonfree program has a lord, a master --
63 and if you use the program, he is your master." Richard Stallman