Gentoo Archives: gentoo-scm

From: "Robin H. Johnson" <robbat2@g.o>
To: gentoo-scm@l.g.o
Subject: Re: [gentoo-scm] Splitting gentoo-x86 repository for easier consumption
Date: Sat, 11 Apr 2009 16:38:12
Message-Id: 20090411T161427Z@curie.orbis-terrarum.net
In Reply to: [gentoo-scm] Splitting gentoo-x86 repository for easier consumption by Maciej Mrozowski
1 On Sat, Apr 11, 2009 at 02:40:27PM +0200, Maciej Mrozowski wrote:
2 > As git is generally not best suited for large repositories, there are some
3 > ideas how to make it perform better with gentoo-x86.
4 > - horizontal partitioning - a'ka cutting history
5 > - vertical partitioning - splitting based on some categorization (not sure
6 > whether this was considered here, hence my mail).
7 Both of these come under the aegis of partial trees.
8 On the Git side, I'd like to deliberately direct you to this GSoC
9 project:
10 http://git.or.cz/gitwiki/SoC2009Ideas#head-2cdf2f7bd7667427d1e20c714ca33bd92aaa4905
11 It's been in the works for a couple of years in Git, and is well fleshed
12 out as a proposal for now. When it does come to fruition, it will make
13 the entire matter of having to split the repository irrelevant.
14
15 > As cutting history have already been proposed here and received with mixed
16 > opinions, I'd like to suggest category based partitioning (divide and conquer
17 > approach FTW).
18 > Well, not really category-based...
19 I take it by this that you've looked at the past discussions.
20 I'd also like to bring your attension to the thread I started on the Git
21 mailing list, about memory usage, but in which I also gave some of our
22 current numbers:
23 http://thread.gmane.org/gmane.comp.version-control.git/115600/focus=115611
24 http://thread.gmane.org/gmane.comp.version-control.git/115600/focus=115636
25 The first discusses the general case of overhead per VCS (there's a nice
26 summary at the bottom).
27 The second gives overall numbers for Git with single-repo and
28 repo-per-package. The overhead imposed by repo-per-package makes it a
29 non-starter.
30
31 > Let's look at tree - one thing can be said about each package - it belongs to
32 > some herd or (doesn't, and it's with status maintainer wanted or maintained by
33 > individual developers).
34 > So creating separate repository for each herd is the most obvious (and naive)
35 > idea.
36 Packages do change herds, and there are certainly cross-dependencies
37 between herds. Some packages even belong to multiple herds.
38
39 > Pros are the following:
40 > - project members taking care of some herd (or belonging to herd?) receive
41 > (and have access) (only) to repository they are interested in, resulting in
42 > smaller pulls/pushes
43 No, the push/pull beyond the initial clone only contains changes anyway,
44 so there is NO change in size.
45
46 > - some level of isolation - gives possibility to restrict access (for example:
47 > "only toolchain and arch teams allowed here")
48 There is only a single category (sec-policy) in the ENTIRE tree that's
49 restricted right now, and that restriction is probably going to go away
50 in future once we have guaranteed signed commits.
51
52 > - some testing overlays could now just track their tree counterparts - merging
53 > stuff from testing to tree could be semi-automatic and trivial
54 Overlays are a lot more of a mish-mash than that. dberkholz for example
55 had his own git.eclass and dev-util/git until recently, mixed right in
56 with other packages.
57
58 > - alternative projects - like hardened - can just have separate branches when
59 > appropriate - for easy merges with "main tree"
60 > - profile can be (should be actually) separated in another repository and
61 > developed easier
62 I don't know how you consider separate to be better. One of the major
63 reasons for designing the multi-parent stackable profiles was for the
64 unusual mixed cases. Say you wanted to do hardened on a machine that
65 hardened doesn't presently support, all you have to do is pick both in
66 your own make.profile/parent file.
67
68 > Some cons:
69 > - projects are now more dependant on other projects and its responsiveness,
70 > unless access is granted to all repositories for every developer
71 As noted by rbu, we explicitly trust every developer to modify most of
72 the tree.
73
74 > - needs some basic tools to 'glue' final repository and ready it for rsync
75 > - to fully benefit from git - robbat2 would need to propose his slim manifest
76 > format as GLEP (or in case of lack of time - quite possible - get someone else
77 > to do it) and get it implemented by someone.
78 We need said tools for the thin Manifest stuff anyway.
79
80 > - possibly needs better multiple repositories support in Portage (not sure
81 > though)
82 > - profile no longer there
83 > - probably not easy way to migrate from monolithic gentoo-x86 to split sub-
84 > repositories retaining complete history
85 > - not settled yet what to do with orphaned/proxy maintained packages and herd-
86 > switching
87 More cons:
88 - If a package moves between repos, it consumes space in the history on
89 both sides, forever.
90 - Less opportunities for Git's amazing compression.
91 - If you wanted to track the entire tree, you need multiple repositories
92 now, vs. being able to clone just a single repo, and maintain your own
93 branch on top of it.
94
95 > Zac, I'm CC-ing you here, I hope you don't mind. Sorry, but your input is too
96 > valuable here :)
97 He's on the list ;-).
98
99 --
100 Robin Hugh Johnson
101 Gentoo Linux Developer & Infra Guy
102 E-Mail : robbat2@g.o
103 GnuPG FP : 11AC BA4F 4778 E3F6 E4ED F38E B27B 944E 3488 4E85

Replies

Subject Author
Re: [gentoo-scm] Splitting gentoo-x86 repository for easier consumption Caleb Cushing <xenoterracide@×××××.com>
Re: [gentoo-scm] Splitting gentoo-x86 repository for easier consumption Maciej Mrozowski <reavertm@××××××.fm>