Gentoo Logo
Gentoo Spaceship

Note: Due to technical difficulties, the Archives are currently not up to date. GMANE provides an alternative service for most mailing lists.
c.f. bug 424647
List Archive: gentoo-scm
Lists: gentoo-scm: < Prev By Thread Next > < Prev By Date Next >
To: gentoo-scm@g.o
From: Maciej Mrozowski <reavertm@...>
Subject: Re: Splitting gentoo-x86 repository for easier consumption
Date: Tue, 14 Apr 2009 00:54:18 +0200
On Saturday 11 of April 2009 18:38:05 Robin H. Johnson wrote:

>> - horizontal partitioning - a'ka cutting history
>> - vertical partitioning - splitting based on some categorization (not

> Both of these come under the aegis of partial trees.
> On the Git side, I'd like to deliberately direct you to this GSoC
> project:
>2aaa4905 It's been in the works for a couple of years in Git, and is well
> fleshed out as a proposal for now. When it does come to fruition, it will
> make the entire matter of having to split the repository irrelevant.

I samewhat agree, at least in Gentoo case splitting repository would be just 
workaround to git issues with full repack upon initial clone (as it seems 
already existing and carefully created packs on server are just for some 
reason discarded and regenerated from scratch).
Thanks for links to your git ML thread btw.
While there are some noticeable improvements (one memory leak fixed) and by 
keeping some objects just temporarily (Linus patches), whole issue with memory 
and packet regeneration seems far from being really solved yet.
About git bundle - in any case it seems to suit Gentoo needs well as at least  
memory and CPU burning caused by "cloning repo" would be no longer triggered 
from outside (thus minimizing potential service abuse causing denial of it), 
Still disabling git clone method may need patches anyway.

>> As cutting history have already been proposed here and received with
>> mixed opinions, I'd like to suggest category based partitioning (divide
>> and conquer approach FTW).
>> Well, not really category-based...

> I take it by this that you've looked at the past discussions.
> I'd also like to bring your attension to the thread I started on the Git
> mailing list, about memory usage, but in which I also gave some of our
> current numbers:
> The first discusses the general case of overhead per VCS (there's a nice
> summary at the bottom).
> The second gives overall numbers for Git with single-repo and
> repo-per-package. The overhead imposed by repo-per-package makes it a
> non-starter.

package-per-repo is way too fine grained even when those overheads were not 
present. What really is a pain, is current category based package-space 
partitioning. It doesn't provide neither completeness nor sufficient 
separation. And because of that unfortunately is often subject to change 
(judging from profiles/updates) - but that only means that from software 
engineering point of view it's just bad design - unfortunately it's too 
convenient to be dropped just right away probably.

But, if for example packages were given unique names or ID's (app-emacs is 
causing most name collisions, with unique names, categories could be replaced 
by tags, and provided only for searching and not dependency resolving - not 
thread for such discussion anyway), then they could be stored in some 
predictable and what's the most important - invariant manner - giving 
possibility to exploit that feature in some data partitioning, storage 
optimalization, etc (for example grouping by first letter).
In most cases, having one monolithic repository suits best - it's just the 
problem that atomicity of huge objects can't be provided in efficient manner - 
and unfortunately whole Portage tree is atomic by definition.

>> - alternative projects - like hardened - can just have separate branches
>> when appropriate - for easy merges with "main tree"
>> - profile can be (should be actually) separated in another repository and
>> developed easier

> I don't know how you consider separate to be better.

Well, I guess first I'd like it to be possible..
For now, hardened is scattered in many places - usually developed outside from 
tree (like hardened sources) then merged manually or just patched in tree 
(which for example doesn't make it easy to track hardened patches separately - 
for example to see diffs against "vanilla" periodically if needed etc).
Bur yeah, at least in this case monolithic hardened seems to suit better.

> One of the major
> reasons for designing the multi-parent stackable profiles was for the
> unusual mixed cases. Say you wanted to do hardened on a machine that
> hardened doesn't presently support, all you have to do is pick both in
> your own make.profile/parent file.

And I'm not after dropping this.
Btw, I judging from responses, I definitely haven't make it clear with profile 
- what I meant was to have one repository only exclusively for developing main 
tree profile - not to split the profile across repositories.

>> - needs some basic tools to 'glue' final repository and ready it for
>> rsync - to fully benefit from git - robbat2 would need to propose his
>> slim manifest format as GLEP (or in case of lack of time - quite possible
>> - get someone else to do it) and get it implemented by someone.

> More cons:
> - If a package moves between repos, it consumes space in the history on
>   both sides, forever.

Agreed. Decision to move something between repositories need to be well 
justified. If repo-per-herd is suffering from many justified refactoring, then 
of course cannot be used.

> - Less opportunities for Git's amazing compression.
> - If you wanted to track the entire tree, you need multiple repositories
>   now, vs. being able to clone just a single repo, and maintain your own
>   branch on top of it.

This of course can be scripted (as git I believe doesn't have anything similar 
to svn externals - as it's what CVS/SVN is not after all).

If git clone is blocker here, maybe some working workaround could be 
implemented, like distributing mentioned git bundle (it's something like SVN 
snapshot after all) and forbidding git clone method.

signature.asc (This is a digitally signed message part.)
Splitting gentoo-x86 repository for easier consumption
-- Maciej Mrozowski
Re: Splitting gentoo-x86 repository for easier consumption
-- Robin H. Johnson
Lists: gentoo-scm: < Prev By Thread Next > < Prev By Date Next >
Previous by thread:
Re: Splitting gentoo-x86 repository for easier consumption
Next by thread:
Re: Splitting gentoo-x86 repository for easier consumption
Previous by date:
Re: Status report, 2009/04/10
Next by date:
Converting a recent CVS copy - Item 1: mailmap fun

Updated Jun 17, 2009

Summary: Archive of the gentoo-scm mailing list.

Donate to support our development efforts.

Copyright 2001-2013 Gentoo Foundation, Inc. Questions, Comments? Contact us.