On 23:43 Mon 08 Sep , Robin H. Johnson wrote:
> Other large projects that have either had or are conducting the
> discussions about switching SCMs
> - FreeBSD
> - KDE
> - Ruby on Rails
> - GHC (Haskell)
> Know of any more?
- X.Org <http://keithp.com/blog/Repository_Formats_Matter/>
- OpenSolaris <http://opensolaris.org/os/community/tools/scm/history/>
- GNOME <http://live.gnome.org/DistributedSCM>
- Linux kernel
- Gentoo <http://www.gentoo.org/proj/en/infrastructure/cvs-migration.xml>
(Needs an update)
> Migration tools:
> - cvs2svn looks best as it's easy to customize into doing what we want
> it to (Python).
Other existing ones, for completeness:
- fromcvs <http://www.selenic.com/mercurial/wiki/index.cgi/fromcvs>
- parsecvs <http://cgit.freedesktop.org/~keithp/parsecvs/>
- git cvsimport
In case we want to split the tree into multiple modules:
- git-split <people.freedesktop.org/~jamey/git-split> (This dies on
our tree at the moment, in part because of recursion limits.)
We should figure out our requirements for a migration tool to make sure
we're picking the best one. Here's what I can think of:
- Handling huge numbers of changesets (See package.mask)
- Not mixing unrelated changesets with similar commit messages
- Incremental imports (Just what's changed since last import. Useful
for initial migration)
- Handling branches correctly (we essentially never used them)
- Hardware requirements for migration tools
> Doing more test migrations, and having a test-plan for comparing them
> directly, as well as against other SCMs.
The OpenSolaris link above is quite useful for comparisons, and the
"Repository Formats Matter" post from Keith Packard is helpful for
understanding one good reason why git might be the best choice.
Same as above, what are our requirements and what doesn't matter? Here's
the OpenSolaris list:
- Fast branching (This will make it possible for new styles of
development in Gentoo.)
- Fast committing (This will encourage more atomic commits from a
- Reliable (Repository format & committing process guarantee no
- Usability (This can be either discoverable or through good
documention, found elsewhere or produced by us.)
- Modifiable (Written in a reasonably common language. Read: Python, C
or shell. git and bzr qualify, darcs doesn't.)
- Active upstream (Getting modifications into upstream code,
- Hooks (Implement custom checks upon commit to your or main
- Partial checkouts. They aren't useful enough to be a requirement, in
my view, because I have yet to hear a good reason they're needed. A
gig or two of disk space is cheap.
- Integration into popular text editors
- CVS gateway (people can still commit using CVS)
- Shallow checkouts (Only getting partial history to reduce size. git
supports grafting two repositories together, not sure about other
SCMs. Not sure how to do the initial splice. Explore
Another point I'd like to get into is how we should architect this.
Should we stick with the single repository for the whole thing, or
should we break it down so that each package has its own repository? If
we go with the latter, we need to figure out a way to easily check out &
update the whole repo.
We also encounter issues with atomic commits across multiple packages.
git has submodule support to partially address this, although it may
require slight enhancements so that it keeps all of the submodules at
HEAD instead of at arbitrary commits. This additionally runs into some
potential issues with duplication of history if packages move, etc. I
don't remember the details, but Robin knows about them.
One interesting possibility with the packages as separate repositories
thing is that we could have a flat structure of repositories and somehow
structure it into categories for rsync using some type of map. This
opens the door to using tags instead of categories. More thought needed.
Developer, Gentoo Linux