On Sat, Apr 11, 2009 at 02:40:27PM +0200, Maciej Mrozowski wrote:
> As git is generally not best suited for large repositories, there are some
> ideas how to make it perform better with gentoo-x86.
> - horizontal partitioning - a'ka cutting history
> - vertical partitioning - splitting based on some categorization (not sure
> whether this was considered here, hence my mail).
Both of these come under the aegis of partial trees.
On the Git side, I'd like to deliberately direct you to this GSoC
It's been in the works for a couple of years in Git, and is well fleshed
out as a proposal for now. When it does come to fruition, it will make
the entire matter of having to split the repository irrelevant.
> As cutting history have already been proposed here and received with mixed
> opinions, I'd like to suggest category based partitioning (divide and conquer
> approach FTW).
> Well, not really category-based...
I take it by this that you've looked at the past discussions.
I'd also like to bring your attension to the thread I started on the Git
mailing list, about memory usage, but in which I also gave some of our
The first discusses the general case of overhead per VCS (there's a nice
summary at the bottom).
The second gives overall numbers for Git with single-repo and
repo-per-package. The overhead imposed by repo-per-package makes it a
> Let's look at tree - one thing can be said about each package - it belongs to
> some herd or (doesn't, and it's with status maintainer wanted or maintained by
> individual developers).
> So creating separate repository for each herd is the most obvious (and naive)
Packages do change herds, and there are certainly cross-dependencies
between herds. Some packages even belong to multiple herds.
> Pros are the following:
> - project members taking care of some herd (or belonging to herd?) receive
> (and have access) (only) to repository they are interested in, resulting in
> smaller pulls/pushes
No, the push/pull beyond the initial clone only contains changes anyway,
so there is NO change in size.
> - some level of isolation - gives possibility to restrict access (for example:
> "only toolchain and arch teams allowed here")
There is only a single category (sec-policy) in the ENTIRE tree that's
restricted right now, and that restriction is probably going to go away
in future once we have guaranteed signed commits.
> - some testing overlays could now just track their tree counterparts - merging
> stuff from testing to tree could be semi-automatic and trivial
Overlays are a lot more of a mish-mash than that. dberkholz for example
had his own git.eclass and dev-util/git until recently, mixed right in
with other packages.
> - alternative projects - like hardened - can just have separate branches when
> appropriate - for easy merges with "main tree"
> - profile can be (should be actually) separated in another repository and
> developed easier
I don't know how you consider separate to be better. One of the major
reasons for designing the multi-parent stackable profiles was for the
unusual mixed cases. Say you wanted to do hardened on a machine that
hardened doesn't presently support, all you have to do is pick both in
your own make.profile/parent file.
> Some cons:
> - projects are now more dependant on other projects and its responsiveness,
> unless access is granted to all repositories for every developer
As noted by rbu, we explicitly trust every developer to modify most of
> - needs some basic tools to 'glue' final repository and ready it for rsync
> - to fully benefit from git - robbat2 would need to propose his slim manifest
> format as GLEP (or in case of lack of time - quite possible - get someone else
> to do it) and get it implemented by someone.
We need said tools for the thin Manifest stuff anyway.
> - possibly needs better multiple repositories support in Portage (not sure
> - profile no longer there
> - probably not easy way to migrate from monolithic gentoo-x86 to split sub-
> repositories retaining complete history
> - not settled yet what to do with orphaned/proxy maintained packages and herd-
- If a package moves between repos, it consumes space in the history on
both sides, forever.
- Less opportunities for Git's amazing compression.
- If you wanted to track the entire tree, you need multiple repositories
now, vs. being able to clone just a single repo, and maintain your own
branch on top of it.
> Zac, I'm CC-ing you here, I hope you don't mind. Sorry, but your input is too
> valuable here :)
He's on the list ;-).
Robin Hugh Johnson
Gentoo Linux Developer & Infra Guy
E-Mail : firstname.lastname@example.org
GnuPG FP : 11AC BA4F 4778 E3F6 E4ED F38E B27B 944E 3488 4E85