1 |
First I'd like to make two comments about your general approach: |
2 |
|
3 |
1. It seems like you are doing an awful lot of work and using a lot of |
4 |
movable parts to reduce the switchover time. Just how much traffic does |
5 |
your repo get? Is it really worth the effort to reduce the downtime by |
6 |
a couple of hours? |
7 |
|
8 |
2. It seem to me to be overkill to insist on a repository validation |
9 |
between the conversion and switching the repository live. I would suggest |
10 |
|
11 |
do { |
12 |
Adjust conversion scripts |
13 |
Copy CVS repo |
14 |
Do test conversion |
15 |
Validate converted repository |
16 |
} until (validation perfect); |
17 |
|
18 |
Switch CVS repo to read-only |
19 |
Do final conversion |
20 |
Switch git repo live |
21 |
Validate converted repository at leisure for your peace of mind |
22 |
|
23 |
Because realistically, you are not going to get a perfect conversion |
24 |
during your test followed by a corrupt one a few days later when there |
25 |
are only a few more commits in the CVS repository. The point of testing |
26 |
is to perfect the migration scripts and establish confidence in the |
27 |
tools, at which point it would be extraordinary for the final conversion |
28 |
to have invisible problems that would be detected by the validation but |
29 |
which weren't there during the test conversions. |
30 |
|
31 |
But, if you really want to optimize the testing... |
32 |
|
33 |
On 10/16/2012 07:59 PM, Rich Freeman wrote: |
34 |
> [...] |
35 |
> Next step is to get cvs into a similar format. My initial thoughts: |
36 |
> |
37 |
> 1. Just run cvs log in the root, chop it up at file boundaries, |
38 |
> base64 encode each blob, and dump that into a text file one file per |
39 |
> line. |
40 |
> 2. Distribute processing of each file, turning it into one line per |
41 |
> commit with all the info my git program dumps, save the file hash. |
42 |
> 3. Distribute reading those lines, checking out that one version of |
43 |
> one file, calculating the hash, and outputting the full info. |
44 |
> |
45 |
> Steps 1/2 might be cheaper to just combine since you have to scan the |
46 |
> whole thing to chop it up and the parsing can't be THAT expensive. |
47 |
> |
48 |
> If there are libs to make any of this easier I'm all ears, but it |
49 |
> seems like there isn't much out there - nothing like pygit2. |
50 |
|
51 |
Obviously there's a lot of code in cvs2git for parsing CVS files and |
52 |
extracting metadata and/or the revision fulltext from them. It's pretty |
53 |
straightforward really; the only slightly subtle thing is the expansion |
54 |
of RCS keywords (like $Revision$ etc). The code in cvs2git has the |
55 |
advantages that it is pure Python and that it can generate all of the |
56 |
CVS revision fulltexts via one parse of the RCS file, so it is vastly |
57 |
faster than running CVS once per revision. |
58 |
|
59 |
Depending on your level of paranoia, you might not trust cvs2git to |
60 |
extract information from RCS files that you want to use to validate |
61 |
cvs2git. OTOH the parsing of the CVS files and the recreation of the |
62 |
fulltext is not the complicated part of cvs2git. The complicated part |
63 |
is inferring the original project-wide commits from the individual file |
64 |
histories recorded by CVS. But given that your project didn't do any |
65 |
branching, even that should be pretty straightforward. |
66 |
|
67 |
> Once I have both I can start working on validation rules and perhaps |
68 |
> get feedback to the conversion team. We'll need to work out what does |
69 |
> and doesn't count as OK. We're doing transformation of data during |
70 |
> migration, so I need to take that into account. Either the logic goes |
71 |
> into the compare function, or the logic goes into the dump side so |
72 |
> that the compares work out the same. Timestamps might force us to do |
73 |
> logic during compare anyway. |
74 |
|
75 |
What kind of data transformations are you doing during the migration? |
76 |
|
77 |
Michael |
78 |
|
79 |
-- |
80 |
Michael Haggerty |
81 |
mhagger@××××××××.edu |
82 |
http://softwareswirl.blogspot.com/ |