Gentoo Archives: gentoo-scm

From:	Michael Haggerty <mhagger@××××××××.edu>
To:	gentoo-scm@l.g.o
Subject:	Re: [gentoo-scm] Git Validation - Update
Date:	Tue, 16 Oct 2012 18:42:52
Message-Id:	`507DAAA0.7060107@alum.mit.edu`
In Reply to:	[gentoo-scm] Git Validation - Update by Rich Freeman

1	First I'd like to make two comments about your general approach:
2
3	1. It seems like you are doing an awful lot of work and using a lot of
4	movable parts to reduce the switchover time. Just how much traffic does
5	your repo get? Is it really worth the effort to reduce the downtime by
6	a couple of hours?
7
8	2. It seem to me to be overkill to insist on a repository validation
9	between the conversion and switching the repository live. I would suggest
10
11	do {
12	Adjust conversion scripts
13	Copy CVS repo
14	Do test conversion
15	Validate converted repository
16	} until (validation perfect);
17
18	Switch CVS repo to read-only
19	Do final conversion
20	Switch git repo live
21	Validate converted repository at leisure for your peace of mind
22
23	Because realistically, you are not going to get a perfect conversion
24	during your test followed by a corrupt one a few days later when there
25	are only a few more commits in the CVS repository. The point of testing
26	is to perfect the migration scripts and establish confidence in the
27	tools, at which point it would be extraordinary for the final conversion
28	to have invisible problems that would be detected by the validation but
29	which weren't there during the test conversions.
30
31	But, if you really want to optimize the testing...
32
33	On 10/16/2012 07:59 PM, Rich Freeman wrote:
34	> [...]
35	> Next step is to get cvs into a similar format. My initial thoughts:
36	>
37	> 1. Just run cvs log in the root, chop it up at file boundaries,
38	> base64 encode each blob, and dump that into a text file one file per
39	> line.
40	> 2. Distribute processing of each file, turning it into one line per
41	> commit with all the info my git program dumps, save the file hash.
42	> 3. Distribute reading those lines, checking out that one version of
43	> one file, calculating the hash, and outputting the full info.
44	>
45	> Steps 1/2 might be cheaper to just combine since you have to scan the
46	> whole thing to chop it up and the parsing can't be THAT expensive.
47	>
48	> If there are libs to make any of this easier I'm all ears, but it
49	> seems like there isn't much out there - nothing like pygit2.
50
51	Obviously there's a lot of code in cvs2git for parsing CVS files and
52	extracting metadata and/or the revision fulltext from them. It's pretty
53	straightforward really; the only slightly subtle thing is the expansion
54	of RCS keywords (like $Revision$ etc). The code in cvs2git has the
55	advantages that it is pure Python and that it can generate all of the
56	CVS revision fulltexts via one parse of the RCS file, so it is vastly
57	faster than running CVS once per revision.
58
59	Depending on your level of paranoia, you might not trust cvs2git to
60	extract information from RCS files that you want to use to validate
61	cvs2git. OTOH the parsing of the CVS files and the recreation of the
62	fulltext is not the complicated part of cvs2git. The complicated part
63	is inferring the original project-wide commits from the individual file
64	histories recorded by CVS. But given that your project didn't do any
65	branching, even that should be pretty straightforward.
66
67	> Once I have both I can start working on validation rules and perhaps
68	> get feedback to the conversion team. We'll need to work out what does
69	> and doesn't count as OK. We're doing transformation of data during
70	> migration, so I need to take that into account. Either the logic goes
71	> into the compare function, or the logic goes into the dump side so
72	> that the compares work out the same. Timestamps might force us to do
73	> logic during compare anyway.
74
75	What kind of data transformations are you doing during the migration?
76
77	Michael
78
79	--
80	Michael Haggerty
81	mhagger@××××××××.edu
82	http://softwareswirl.blogspot.com/

Replies

Subject	Author
Re: [gentoo-scm] Git Validation - Update	Rich Freeman <rich0@g.o>
Re: [gentoo-scm] Git Validation - Update	"Robin H. Johnson" <robbat2@g.o>

Report Message

Find on MARC Find on Google Groups