Gentoo Archives: gentoo-scm

From: Maciej Mrozowski <reavertm@××××××.fm>
To: gentoo-scm@l.g.o
Subject: Re: [gentoo-scm] Splitting gentoo-x86 repository for easier consumption
Date: Mon, 13 Apr 2009 22:54:27
In Reply to: Re: [gentoo-scm] Splitting gentoo-x86 repository for easier consumption by "Robin H. Johnson"
On Saturday 11 of April 2009 18:38:05 Robin H. Johnson wrote:

>> - horizontal partitioning - a'ka cutting history >> - vertical partitioning - splitting based on some categorization (not
> Both of these come under the aegis of partial trees. > On the Git side, I'd like to deliberately direct you to this GSoC > project: > >2aaa4905 It's been in the works for a couple of years in Git, and is well > fleshed out as a proposal for now. When it does come to fruition, it will > make the entire matter of having to split the repository irrelevant.
I samewhat agree, at least in Gentoo case splitting repository would be just workaround to git issues with full repack upon initial clone (as it seems already existing and carefully created packs on server are just for some reason discarded and regenerated from scratch). Thanks for links to your git ML thread btw. While there are some noticeable improvements (one memory leak fixed) and by keeping some objects just temporarily (Linus patches), whole issue with memory and packet regeneration seems far from being really solved yet. About git bundle - in any case it seems to suit Gentoo needs well as at least memory and CPU burning caused by "cloning repo" would be no longer triggered from outside (thus minimizing potential service abuse causing denial of it), Still disabling git clone method may need patches anyway.
>> As cutting history have already been proposed here and received with >> mixed opinions, I'd like to suggest category based partitioning (divide >> and conquer approach FTW). >> Well, not really category-based...
> I take it by this that you've looked at the past discussions. > I'd also like to bring your attension to the thread I started on the Git > mailing list, about memory usage, but in which I also gave some of our > current numbers: > > > The first discusses the general case of overhead per VCS (there's a nice > summary at the bottom). > The second gives overall numbers for Git with single-repo and > repo-per-package. The overhead imposed by repo-per-package makes it a > non-starter.
package-per-repo is way too fine grained even when those overheads were not present. What really is a pain, is current category based package-space partitioning. It doesn't provide neither completeness nor sufficient separation. And because of that unfortunately is often subject to change (judging from profiles/updates) - but that only means that from software engineering point of view it's just bad design - unfortunately it's too convenient to be dropped just right away probably. But, if for example packages were given unique names or ID's (app-emacs is causing most name collisions, with unique names, categories could be replaced by tags, and provided only for searching and not dependency resolving - not thread for such discussion anyway), then they could be stored in some predictable and what's the most important - invariant manner - giving possibility to exploit that feature in some data partitioning, storage optimalization, etc (for example grouping by first letter). In most cases, having one monolithic repository suits best - it's just the problem that atomicity of huge objects can't be provided in efficient manner - and unfortunately whole Portage tree is atomic by definition.
>> - alternative projects - like hardened - can just have separate branches >> when appropriate - for easy merges with "main tree" >> - profile can be (should be actually) separated in another repository and >> developed easier
> I don't know how you consider separate to be better.
Well, I guess first I'd like it to be possible.. For now, hardened is scattered in many places - usually developed outside from tree (like hardened sources) then merged manually or just patched in tree (which for example doesn't make it easy to track hardened patches separately - for example to see diffs against "vanilla" periodically if needed etc). Bur yeah, at least in this case monolithic hardened seems to suit better.
> One of the major > reasons for designing the multi-parent stackable profiles was for the > unusual mixed cases. Say you wanted to do hardened on a machine that > hardened doesn't presently support, all you have to do is pick both in > your own make.profile/parent file.
And I'm not after dropping this. Btw, I judging from responses, I definitely haven't make it clear with profile - what I meant was to have one repository only exclusively for developing main tree profile - not to split the profile across repositories.
>> - needs some basic tools to 'glue' final repository and ready it for >> rsync - to fully benefit from git - robbat2 would need to propose his >> slim manifest format as GLEP (or in case of lack of time - quite possible >> - get someone else to do it) and get it implemented by someone.
> More cons: > - If a package moves between repos, it consumes space in the history on > both sides, forever.
Agreed. Decision to move something between repositories need to be well justified. If repo-per-herd is suffering from many justified refactoring, then of course cannot be used.
> - Less opportunities for Git's amazing compression. > - If you wanted to track the entire tree, you need multiple repositories > now, vs. being able to clone just a single repo, and maintain your own > branch on top of it.
This of course can be scripted (as git I believe doesn't have anything similar to svn externals - as it's what CVS/SVN is not after all). If git clone is blocker here, maybe some working workaround could be implemented, like distributing mentioned git bundle (it's something like SVN snapshot after all) and forbidding git clone method. -- regards MM


File name MIME type
signature.asc application/pgp-signature