Gentoo Archives: gentoo-scm

From: Michael Haggerty <mhagger@××××××××.edu>
To: Brian Harring <ferringb@×××××.com>
Cc: gentoo-scm@l.g.o
Subject: Re: [gentoo-scm] CVS -> git, list of where non-infra folk can contribute
Date: Thu, 04 Oct 2012 21:34:14
Message-Id: 506E00C8.4070700@alum.mit.edu
In Reply to: Re: [gentoo-scm] CVS -> git, list of where non-infra folk can contribute by Brian Harring
1 On 10/04/2012 09:38 PM, Brian Harring wrote:
2 > On Thu, Oct 04, 2012 at 05:45:40PM +0200, Michael Haggerty wrote:
3 >> Another big boost for cvs2git is to use the ExternalBlobGenerator, which
4 >> extracts revision contents to a "blob" file in CollectRevsPass using an
5 >> external script, thereby making OutputPass much faster.
6 >
7 > This still sequential, or can it effectively run in parallel? I'm
8 > just wondering if the 'faster' is via parallelism, or bypassing a
9 > slower native python implementation.
10
11 It currently does the blob generation in one separate process in
12 parallel with the activities that are happening in the main cvs2svn
13 process. (I remembered incorrectly; the blobs are collected during
14 FilterSymbolsPass, not CollectRevsPass.)
15
16 I think it would be relatively straightforward to generalize this to N
17 blob-generation processes, but I think that blob generation had already
18 ceased to be the bottleneck with only 1 process. However, your
19 conversion is unusual (e.g., linear history, lots of RAM) so it could be
20 that you will hit different limits than my tests did.
21
22 Robin definitely had some version of the ExternalBlobGenerator code back
23 in 2009.
24
25 > If memory servers, where we hit major issues was in dealing w/
26 > encoding- crap like names, some bad latin-1 sort of data. Will have
27 > to re-run this and check.
28
29 This could be fixed in cvs2git with a reasonable amount of effort. I
30 already did a lot of restructuring to allow EOL handling and keyword
31 expansion options to be specified via properties (see
32 doc/properties.txt) and to make the properties available earlier in the
33 conversion. I think the main work that remains is to pass the
34 properties to the generate_blobs.py script and to teach the script to
35 massage the revision contents before writing them to the blobfile.
36
37 > Re: pgsql, good to here- they don't have our whacked tree structure,
38 > but I'm sue the encoding sort of issues we had, they probably had a
39 > couple of.
40 >
41 > As for .cvsignore, at least for gentoo-x86 (repo in discussion), we've
42 > got none of that in use- so non issue, but for our other repos, yeah,
43 > it'll come up. Manually translating and stacking a commit on top of
44 > the results is perfectly acceptable (your tools already cover 99%
45 > after all).
46
47 I think .gitignore files would be pretty easy, too. It's another one of
48 those things that I never got around to before being distracted by other
49 projects (I hack mostly on git itself now).
50
51 >> More details and/or bug reports would be much appreciated if the
52 >> problems still exist in the 2.4.0 release (which is just out, though it
53 >> is well-used code with hardly any recent changes).
54 >
55 > Honestly, I couldn't give you the details; that code I started from in
56 > '10 was inherited; it was already a year old branching/release, w/
57 > local modifications before I got started (I do recall verifying that
58 > the modifications were necessary, else things failed miserably).
59
60 Is the code that you were using visible somewhere?
61
62 Michael
63
64 --
65 Michael Haggerty
66 mhagger@××××××××.edu
67 http://softwareswirl.blogspot.com/

Replies

Subject Author
Re: [gentoo-scm] CVS -> git, list of where non-infra folk can contribute Brian Harring <ferringb@×××××.com>