1 |
On Tue, Oct 2, 2012 at 4:20 PM, Gregory M. Turner <gmt@×××××.us> wrote: |
2 |
> Brian Harring wrote: |
3 |
>> |
4 |
>> replay it into git via tailor; |
5 |
>> |
6 |
> |
7 |
> Never knew about that tool... not sure about the wisdom of adding an extra |
8 |
> moving part just to keep the lights on for those few hours... Given the "2G |
9 |
> of history" issue Diego mentioned, which if I understand correctly, |
10 |
> effectively means that the future gentoo git can never rebase its commit |
11 |
> history, why chance it? |
12 |
|
13 |
I think that the reality is that we're going to have a million dress |
14 |
rehearsals before we do the real thing. Apparently right now the |
15 |
conversion isn't quite right, and we can't validate that it is right |
16 |
either. I don't see any harm in having people look into being able to |
17 |
keep the downtime low while others figure out how the migration works |
18 |
in the first place. |
19 |
|
20 |
Dress rehearsals don't need to even be announced. You just grab a |
21 |
snapshot of cvs at some random time and convert it and test it. Then |
22 |
you grab another snapshot at a later moment in time and try to use it |
23 |
to catch up the converted repository. Then you test it all again. If |
24 |
you can do that on demand without issue then I'd say we're ready to |
25 |
go. |
26 |
|
27 |
I do plan to mess around with validation as I posted yesterday. |
28 |
Rather than dump a lot of time into a "clever" solution like Mapreduce |
29 |
where I have no experience I'll probably just start with a single |
30 |
threaded proof of concept and see just how long it takes. I have |
31 |
thought of ways to optimize things - you can descend the tree of all |
32 |
the commits iteratively side-by-side, and at each step prune every |
33 |
sub-tree that is a duplicate (with a little care to catch situations |
34 |
where the tree might have been reverted). That means that instead of |
35 |
descending the entire tree for every commit you only actually descend |
36 |
the branches that have changes on each commit, which in most cases |
37 |
will just be a single branch anyway. If the records don't proliferate |
38 |
at each step then you're talking about an order of a few million |
39 |
records to check each pass, with only a few passes - that might be |
40 |
reasonable without much heavy equipment. However, the job should |
41 |
still be able to be run in parallel as long as you still run it in |
42 |
stages. |
43 |
|
44 |
I've got pseudocode for the git side - so I'll see what I can do with it. |
45 |
|
46 |
Rich |