Gentoo Archives: gentoo-scm

From: Michael Haggerty <mhagger@××××××××.edu>
To: gentoo-scm@l.g.o
Subject: Re: [gentoo-scm] Git Validation - Update
Date: Tue, 16 Oct 2012 18:42:52
Message-Id: 507DAAA0.7060107@alum.mit.edu
In Reply to: [gentoo-scm] Git Validation - Update by Rich Freeman
1 First I'd like to make two comments about your general approach:
2
3 1. It seems like you are doing an awful lot of work and using a lot of
4 movable parts to reduce the switchover time. Just how much traffic does
5 your repo get? Is it really worth the effort to reduce the downtime by
6 a couple of hours?
7
8 2. It seem to me to be overkill to insist on a repository validation
9 between the conversion and switching the repository live. I would suggest
10
11 do {
12 Adjust conversion scripts
13 Copy CVS repo
14 Do test conversion
15 Validate converted repository
16 } until (validation perfect);
17
18 Switch CVS repo to read-only
19 Do final conversion
20 Switch git repo live
21 Validate converted repository at leisure for your peace of mind
22
23 Because realistically, you are not going to get a perfect conversion
24 during your test followed by a corrupt one a few days later when there
25 are only a few more commits in the CVS repository. The point of testing
26 is to perfect the migration scripts and establish confidence in the
27 tools, at which point it would be extraordinary for the final conversion
28 to have invisible problems that would be detected by the validation but
29 which weren't there during the test conversions.
30
31 But, if you really want to optimize the testing...
32
33 On 10/16/2012 07:59 PM, Rich Freeman wrote:
34 > [...]
35 > Next step is to get cvs into a similar format. My initial thoughts:
36 >
37 > 1. Just run cvs log in the root, chop it up at file boundaries,
38 > base64 encode each blob, and dump that into a text file one file per
39 > line.
40 > 2. Distribute processing of each file, turning it into one line per
41 > commit with all the info my git program dumps, save the file hash.
42 > 3. Distribute reading those lines, checking out that one version of
43 > one file, calculating the hash, and outputting the full info.
44 >
45 > Steps 1/2 might be cheaper to just combine since you have to scan the
46 > whole thing to chop it up and the parsing can't be THAT expensive.
47 >
48 > If there are libs to make any of this easier I'm all ears, but it
49 > seems like there isn't much out there - nothing like pygit2.
50
51 Obviously there's a lot of code in cvs2git for parsing CVS files and
52 extracting metadata and/or the revision fulltext from them. It's pretty
53 straightforward really; the only slightly subtle thing is the expansion
54 of RCS keywords (like $Revision$ etc). The code in cvs2git has the
55 advantages that it is pure Python and that it can generate all of the
56 CVS revision fulltexts via one parse of the RCS file, so it is vastly
57 faster than running CVS once per revision.
58
59 Depending on your level of paranoia, you might not trust cvs2git to
60 extract information from RCS files that you want to use to validate
61 cvs2git. OTOH the parsing of the CVS files and the recreation of the
62 fulltext is not the complicated part of cvs2git. The complicated part
63 is inferring the original project-wide commits from the individual file
64 histories recorded by CVS. But given that your project didn't do any
65 branching, even that should be pretty straightforward.
66
67 > Once I have both I can start working on validation rules and perhaps
68 > get feedback to the conversion team. We'll need to work out what does
69 > and doesn't count as OK. We're doing transformation of data during
70 > migration, so I need to take that into account. Either the logic goes
71 > into the compare function, or the logic goes into the dump side so
72 > that the compares work out the same. Timestamps might force us to do
73 > logic during compare anyway.
74
75 What kind of data transformations are you doing during the migration?
76
77 Michael
78
79 --
80 Michael Haggerty
81 mhagger@××××××××.edu
82 http://softwareswirl.blogspot.com/

Replies

Subject Author
Re: [gentoo-scm] Git Validation - Update Rich Freeman <rich0@g.o>
Re: [gentoo-scm] Git Validation - Update "Robin H. Johnson" <robbat2@g.o>