Gentoo Archives: gentoo-scm

From: Donnie Berkholz <dberkholz@g.o>
To: "Robin H. Johnson" <robbat2@g.o>
Cc: gentoo-scm@l.g.o
Subject: Re: [gentoo-scm] Welcome to Gentoo-SCM discussion, for figuring out Gentoo beyond CVS
Date: Tue, 09 Sep 2008 17:45:10
Message-Id: 20080909174506.GA32182@comet
In Reply to: [gentoo-scm] Welcome to Gentoo-SCM discussion, for figuring out Gentoo beyond CVS by "Robin H. Johnson"
On 23:43 Mon 08 Sep     , Robin H. Johnson wrote:
> 1. > Other large projects that have either had or are conducting the > discussions about switching SCMs > - FreeBSD > - KDE > - Ruby on Rails > - GHC (Haskell) > Know of any more?
- X.Org <http://keithp.com/blog/Repository_Formats_Matter/> <http://keithp.com/blogs/Tyrannical_SCM_selection/> - OpenSolaris <http://opensolaris.org/os/community/tools/scm/history/> - GNOME <http://live.gnome.org/DistributedSCM> - Linux kernel - Gentoo <http://www.gentoo.org/proj/en/infrastructure/cvs-migration.xml> (Needs an update)
> 3. > Migration tools: > - cvs2svn looks best as it's easy to customize into doing what we want > it to (Python).
Other existing ones, for completeness: - fromcvs <http://www.selenic.com/mercurial/wiki/index.cgi/fromcvs> - parsecvs <http://cgit.freedesktop.org/~keithp/parsecvs/> - git cvsimport In case we want to split the tree into multiple modules: - git-split <people.freedesktop.org/~jamey/git-split> (This dies on our tree at the moment, in part because of recursion limits.) We should figure out our requirements for a migration tool to make sure we're picking the best one. Here's what I can think of: Important - Handling huge numbers of changesets (See package.mask) - Not mixing unrelated changesets with similar commit messages ("Version bump.") - Incremental imports (Just what's changed since last import. Useful for initial migration) Unimportant - Handling branches correctly (we essentially never used them) - Hardware requirements for migration tools What else?
> 4. > Doing more test migrations, and having a test-plan for comparing them > directly, as well as against other SCMs.
The OpenSolaris link above is quite useful for comparisons, and the "Repository Formats Matter" post from Keith Packard is helpful for understanding one good reason why git might be the best choice. Same as above, what are our requirements and what doesn't matter? Here's the OpenSolaris list: http://opensolaris.org/os/community/tools/scm/dscmreqdoc/ Important - Fast branching (This will make it possible for new styles of development in Gentoo.) - Fast committing (This will encourage more atomic commits from a functional POV.) - Reliable (Repository format & committing process guarantee no corruption.) - Usability (This can be either discoverable or through good documention, found elsewhere or produced by us.) - Modifiable (Written in a reasonably common language. Read: Python, C or shell. git and bzr qualify, darcs doesn't.) - Active upstream (Getting modifications into upstream code, requesting features) - Hooks (Implement custom checks upon commit to your or main repository.) Optional - Partial checkouts. They aren't useful enough to be a requirement, in my view, because I have yet to hear a good reason they're needed. A gig or two of disk space is cheap. - Integration into popular text editors - CVS gateway (people can still commit using CVS) - Shallow checkouts (Only getting partial history to reduce size. git supports grafting two repositories together, not sure about other SCMs. Not sure how to do the initial splice. Explore 'git-filter-branch'?) Unimportant - ??? What else? Another point I'd like to get into is how we should architect this. Should we stick with the single repository for the whole thing, or should we break it down so that each package has its own repository? If we go with the latter, we need to figure out a way to easily check out & update the whole repo. We also encounter issues with atomic commits across multiple packages. git has submodule support to partially address this, although it may require slight enhancements so that it keeps all of the submodules at HEAD instead of at arbitrary commits. This additionally runs into some potential issues with duplication of history if packages move, etc. I don't remember the details, but Robin knows about them. One interesting possibility with the packages as separate repositories thing is that we could have a flat structure of repositories and somehow structure it into categories for rsync using some type of map. This opens the door to using tags instead of categories. More thought needed. -- Thanks, Donnie Donnie Berkholz Developer, Gentoo Linux Blog: http://dberkholz.wordpress.com