Gentoo Archives: gentoo-scm

From: Rich Freeman <rich0@g.o>
To: gentoo-scm@l.g.o
Cc: ferringb@×××××.com
Subject: [gentoo-scm] Re: [gentoo-dev] CVS -> git, list of where non-infra folk can contribute
Date: Tue, 02 Oct 2012 21:21:34
Message-Id: CAGfcS_m4FGBy2mkQwmz+QWyN4F=PBYEVX7JvDYhaArXq71y+TA@mail.gmail.com
1 On Tue, Oct 2, 2012 at 4:20 PM, Gregory M. Turner <gmt@×××××.us> wrote:
2 > Brian Harring wrote:
3 >>
4 >> replay it into git via tailor;
5 >>
6 >
7 > Never knew about that tool... not sure about the wisdom of adding an extra
8 > moving part just to keep the lights on for those few hours... Given the "2G
9 > of history" issue Diego mentioned, which if I understand correctly,
10 > effectively means that the future gentoo git can never rebase its commit
11 > history, why chance it?
12
13 I think that the reality is that we're going to have a million dress
14 rehearsals before we do the real thing. Apparently right now the
15 conversion isn't quite right, and we can't validate that it is right
16 either. I don't see any harm in having people look into being able to
17 keep the downtime low while others figure out how the migration works
18 in the first place.
19
20 Dress rehearsals don't need to even be announced. You just grab a
21 snapshot of cvs at some random time and convert it and test it. Then
22 you grab another snapshot at a later moment in time and try to use it
23 to catch up the converted repository. Then you test it all again. If
24 you can do that on demand without issue then I'd say we're ready to
25 go.
26
27 I do plan to mess around with validation as I posted yesterday.
28 Rather than dump a lot of time into a "clever" solution like Mapreduce
29 where I have no experience I'll probably just start with a single
30 threaded proof of concept and see just how long it takes. I have
31 thought of ways to optimize things - you can descend the tree of all
32 the commits iteratively side-by-side, and at each step prune every
33 sub-tree that is a duplicate (with a little care to catch situations
34 where the tree might have been reverted). That means that instead of
35 descending the entire tree for every commit you only actually descend
36 the branches that have changes on each commit, which in most cases
37 will just be a single branch anyway. If the records don't proliferate
38 at each step then you're talking about an order of a few million
39 records to check each pass, with only a few passes - that might be
40 reasonable without much heavy equipment. However, the job should
41 still be able to be run in parallel as long as you still run it in
42 stages.
43
44 I've got pseudocode for the git side - so I'll see what I can do with it.
45
46 Rich

Replies