From: Donnie Berkholz <dberkholz@g.o>
To: gentoo-scm@l.g.o
Subject: [gentoo-scm] Testing git conversion
Date: Tue, 28 Oct 2008 06:02:00
After talking with some git developers at the Summer of Code mentor 
summit, I resumed work on getting gentoo-x86 into git. Today I got the 
first conversion to git from cvs working, using cvs2svn's cvs2git 
backend. I attached the config files I used. Here was the process:

- Start with an rsync of the gentoo-x86 cvs repo
- `mkdir CVSROOT` in the same level as gentoo-x86
- cvs2svn --options=cvs2svn-git.options | tee cvs2svn.log
- mkdir gentoo-x86-cvs2git
- cd gentoo-x86-cvs2git/
- git init
- cat ../cvs2svn-tmp/git-blob.dat ../cvs2svn-tmp/git-dump.dat \
  | git fast-import | tee git-fast-import.log
- time git repack -a -d -f -l --window=50 | tee git-repack.log
  (took ~25 minutes)
- rsync -avzP -e ssh .git/

Here's some stats on the conversion:

Total CVS Files:            341121
Total CVS Revisions:       2035092
Total CVS Branches:              0
Total CVS Tags:                  0
Total Unique Tags:               0
Total Unique Branches:           0
CVS Repos Size in KB:      1412266
Total SVN Commits:          591101
First Revision Date:    Thu Jul 27 17:35:42 2000
Last Revision Date:     Sun Oct 26 15:43:43 2008
Timings (seconds):
22743   pass1    CollectRevsPass
   38   pass2    CleanMetadataPass
    0   pass3    CollateSymbolsPass
  184   pass4    FilterSymbolsPass
    3   pass5    SortRevisionSummaryPass
    0   pass6    SortSymbolSummaryPass
  222   pass7    InitializeChangesetsPass
 2535   pass8    BreakRevisionChangesetCyclesPass
13238   pass9    RevisionTopologicalSortPass
   60   pass10   BreakSymbolChangesetCyclesPass
  218   pass11   BreakAllChangesetCyclesPass
  187   pass12   TopologicalSortPass
  457   pass13   CreateRevsPass
    0   pass14   SortSymbolsPass
    0   pass15   IndexSymbolsPass
  176   pass16   OutputPass
40062   total

It took about 12 hours in total, running on a brand-new consumer-level 
box (no RAID, etc). It might run as much as 3x faster if I put the whole 
repo into a ramdisk beforehand, because that ought to remove a lot of 
the pass1 time (reading ,v files). The other place that took a lot of 
time, pass9, seemed mostly CPU-bound.

If you want to grab a copy of the tree and check it out, you can. If 
you're a dev, don't clone over ssh because it hits the server really 
hard. (This remains to be solved.) Here's the place:

  git clone

The repo is around 900M, including all history, thanks to git's pack 
compression. That's roughly equivalent to the size of a cvs checkout 
with no history.

One obvious thing to fix for a final version is adding the author map 
into the config files so we get real names for people. That won't be 
very hard -- we should be able to just pull everything from ldap.

A clear next step is to compare a native cvs checkout with a cvs 
checkout of the git repo through `git cvsserver`. Does anyone want to 
help out and do this?


Donnie Berkholz
Developer, Gentoo Linux


