Gentoo Archives: gentoo-scm

From: "Robin H. Johnson" <robbat2@g.o>
To: gentoo-scm@l.g.o
Subject: Re: [gentoo-scm] Splitting gentoo-x86 repository for easier consumption
Date: Sat, 11 Apr 2009 16:38:12
Message-Id: 20090411T161427Z@curie.orbis-terrarum.net
In Reply to: [gentoo-scm] Splitting gentoo-x86 repository for easier consumption by Maciej Mrozowski
On Sat, Apr 11, 2009 at 02:40:27PM +0200, Maciej Mrozowski wrote:
> As git is generally not best suited for large repositories, there are some > ideas how to make it perform better with gentoo-x86. > - horizontal partitioning - a'ka cutting history > - vertical partitioning - splitting based on some categorization (not sure > whether this was considered here, hence my mail).
Both of these come under the aegis of partial trees. On the Git side, I'd like to deliberately direct you to this GSoC project: http://git.or.cz/gitwiki/SoC2009Ideas#head-2cdf2f7bd7667427d1e20c714ca33bd92aaa4905 It's been in the works for a couple of years in Git, and is well fleshed out as a proposal for now. When it does come to fruition, it will make the entire matter of having to split the repository irrelevant.
> As cutting history have already been proposed here and received with mixed > opinions, I'd like to suggest category based partitioning (divide and conquer > approach FTW). > Well, not really category-based...
I take it by this that you've looked at the past discussions. I'd also like to bring your attension to the thread I started on the Git mailing list, about memory usage, but in which I also gave some of our current numbers: http://thread.gmane.org/gmane.comp.version-control.git/115600/focus=115611 http://thread.gmane.org/gmane.comp.version-control.git/115600/focus=115636 The first discusses the general case of overhead per VCS (there's a nice summary at the bottom). The second gives overall numbers for Git with single-repo and repo-per-package. The overhead imposed by repo-per-package makes it a non-starter.
> Let's look at tree - one thing can be said about each package - it belongs to > some herd or (doesn't, and it's with status maintainer wanted or maintained by > individual developers). > So creating separate repository for each herd is the most obvious (and naive) > idea.
Packages do change herds, and there are certainly cross-dependencies between herds. Some packages even belong to multiple herds.
> Pros are the following: > - project members taking care of some herd (or belonging to herd?) receive > (and have access) (only) to repository they are interested in, resulting in > smaller pulls/pushes
No, the push/pull beyond the initial clone only contains changes anyway, so there is NO change in size.
> - some level of isolation - gives possibility to restrict access (for example: > "only toolchain and arch teams allowed here")
There is only a single category (sec-policy) in the ENTIRE tree that's restricted right now, and that restriction is probably going to go away in future once we have guaranteed signed commits.
> - some testing overlays could now just track their tree counterparts - merging > stuff from testing to tree could be semi-automatic and trivial
Overlays are a lot more of a mish-mash than that. dberkholz for example had his own git.eclass and dev-util/git until recently, mixed right in with other packages.
> - alternative projects - like hardened - can just have separate branches when > appropriate - for easy merges with "main tree" > - profile can be (should be actually) separated in another repository and > developed easier
I don't know how you consider separate to be better. One of the major reasons for designing the multi-parent stackable profiles was for the unusual mixed cases. Say you wanted to do hardened on a machine that hardened doesn't presently support, all you have to do is pick both in your own make.profile/parent file.
> Some cons: > - projects are now more dependant on other projects and its responsiveness, > unless access is granted to all repositories for every developer
As noted by rbu, we explicitly trust every developer to modify most of the tree.
> - needs some basic tools to 'glue' final repository and ready it for rsync > - to fully benefit from git - robbat2 would need to propose his slim manifest > format as GLEP (or in case of lack of time - quite possible - get someone else > to do it) and get it implemented by someone.
We need said tools for the thin Manifest stuff anyway.
> - possibly needs better multiple repositories support in Portage (not sure > though) > - profile no longer there > - probably not easy way to migrate from monolithic gentoo-x86 to split sub- > repositories retaining complete history > - not settled yet what to do with orphaned/proxy maintained packages and herd- > switching
More cons: - If a package moves between repos, it consumes space in the history on both sides, forever. - Less opportunities for Git's amazing compression. - If you wanted to track the entire tree, you need multiple repositories now, vs. being able to clone just a single repo, and maintain your own branch on top of it.
> Zac, I'm CC-ing you here, I hope you don't mind. Sorry, but your input is too > valuable here :)
He's on the list ;-). -- Robin Hugh Johnson Gentoo Linux Developer & Infra Guy E-Mail : robbat2@g.o GnuPG FP : 11AC BA4F 4778 E3F6 E4ED F38E B27B 944E 3488 4E85

Replies

Subject Author
Re: [gentoo-scm] Splitting gentoo-x86 repository for easier consumption Maciej Mrozowski <reavertm@××××××.fm>
Re: [gentoo-scm] Splitting gentoo-x86 repository for easier consumption Caleb Cushing <xenoterracide@×××××.com>