Gentoo Archives: gentoo-scm

From: Rich Freeman <rich0@g.o>
To: gentoo-scm@l.g.o
Subject: Re: [gentoo-scm] Git Validation - Update
Date: Tue, 16 Oct 2012 19:12:36
Message-Id: CAGfcS_ktMrmkw9E_nDsY-S7Z7-wpZqA+eDKx1svMb9JydhcH1Q@mail.gmail.com
In Reply to: Re: [gentoo-scm] Git Validation - Update by Michael Haggerty
1 On Tue, Oct 16, 2012 at 2:42 PM, Michael Haggerty <mhagger@××××××××.edu> wrote:
2 > First I'd like to make two comments about your general approach:
3 >
4 > 1. It seems like you are doing an awful lot of work and using a lot of
5 > movable parts to reduce the switchover time.
6 >...
7 > 2. It seem to me to be overkill to insist on a repository validation
8 > between the conversion and switching the repository live.
9
10 Tend to agree on both. But, it seems to be important to some. In any
11 case, if it turns out we can have our cake and eat it too so much the
12 better. We'll just see how it turns out. If everything is working
13 great and the debate is over downtime I think sane minds will prevail,
14 but we'll see.
15
16 > Depending on your level of paranoia, you might not trust cvs2git to
17 > extract information from RCS files that you want to use to validate
18 > cvs2git. OTOH the parsing of the CVS files and the recreation of the
19 > fulltext is not the complicated part of cvs2git. The complicated part
20 > is inferring the original project-wide commits from the individual file
21 > histories recorded by CVS. But given that your project didn't do any
22 > branching, even that should be pretty straightforward.
23
24 Tend to agree, but still safer to avoid the same code. If cvs goes
25 like git did it will probably only take me a week or two to have it
26 working. If it can run on hadoop then cpu power isn't likely to be a
27 problem. Getting the full dump of the cvs log only took 15min - I
28 can't think that parsing it will take that long. 99% of the battle
29 will be calculating hashes, and that is completely parallel.
30
31 > What kind of data transformations are you doing during the migration?
32
33 I don't have the details on that yet, but it will be something I look
34 at as soon as I have the cvs side ready. Things like usernames to
35 email addresses, I think they're doing some commit message footer
36 tweaking when combining Manifest/file commits, and so on. My approach
37 doesn't care about association of cvs commits, since it wasn't there
38 to begin with. It just cares that every change to every file maps
39 back to the original author, date/time, message, and of course file
40 contents.
41
42 Rich