1 |
On Tue, Oct 16, 2012 at 2:42 PM, Michael Haggerty <mhagger@××××××××.edu> wrote: |
2 |
> First I'd like to make two comments about your general approach: |
3 |
> |
4 |
> 1. It seems like you are doing an awful lot of work and using a lot of |
5 |
> movable parts to reduce the switchover time. |
6 |
>... |
7 |
> 2. It seem to me to be overkill to insist on a repository validation |
8 |
> between the conversion and switching the repository live. |
9 |
|
10 |
Tend to agree on both. But, it seems to be important to some. In any |
11 |
case, if it turns out we can have our cake and eat it too so much the |
12 |
better. We'll just see how it turns out. If everything is working |
13 |
great and the debate is over downtime I think sane minds will prevail, |
14 |
but we'll see. |
15 |
|
16 |
> Depending on your level of paranoia, you might not trust cvs2git to |
17 |
> extract information from RCS files that you want to use to validate |
18 |
> cvs2git. OTOH the parsing of the CVS files and the recreation of the |
19 |
> fulltext is not the complicated part of cvs2git. The complicated part |
20 |
> is inferring the original project-wide commits from the individual file |
21 |
> histories recorded by CVS. But given that your project didn't do any |
22 |
> branching, even that should be pretty straightforward. |
23 |
|
24 |
Tend to agree, but still safer to avoid the same code. If cvs goes |
25 |
like git did it will probably only take me a week or two to have it |
26 |
working. If it can run on hadoop then cpu power isn't likely to be a |
27 |
problem. Getting the full dump of the cvs log only took 15min - I |
28 |
can't think that parsing it will take that long. 99% of the battle |
29 |
will be calculating hashes, and that is completely parallel. |
30 |
|
31 |
> What kind of data transformations are you doing during the migration? |
32 |
|
33 |
I don't have the details on that yet, but it will be something I look |
34 |
at as soon as I have the cvs side ready. Things like usernames to |
35 |
email addresses, I think they're doing some commit message footer |
36 |
tweaking when combining Manifest/file commits, and so on. My approach |
37 |
doesn't care about association of cvs commits, since it wasn't there |
38 |
to begin with. It just cares that every change to every file maps |
39 |
back to the original author, date/time, message, and of course file |
40 |
contents. |
41 |
|
42 |
Rich |