Gentoo Archives: gentoo-scm

From: Brian Harring <ferringb@×××××.com>
To: Michael Haggerty <mhagger@××××××××.edu>
Cc: gentoo-scm@l.g.o
Subject: Re: [gentoo-scm] CVS -> git, list of where non-infra folk can contribute
Date: Fri, 05 Oct 2012 00:05:15
Message-Id: 20121005000521.GB5024@localhost.corp.google.com
In Reply to: Re: [gentoo-scm] CVS -> git, list of where non-infra folk can contribute by Michael Haggerty
1 On Thu, Oct 04, 2012 at 11:34:00PM +0200, Michael Haggerty wrote:
2 > On 10/04/2012 09:38 PM, Brian Harring wrote:
3 > > On Thu, Oct 04, 2012 at 05:45:40PM +0200, Michael Haggerty wrote:
4 > >> Another big boost for cvs2git is to use the ExternalBlobGenerator, which
5 > >> extracts revision contents to a "blob" file in CollectRevsPass using an
6 > >> external script, thereby making OutputPass much faster.
7 > >
8 > > This still sequential, or can it effectively run in parallel? I'm
9 > > just wondering if the 'faster' is via parallelism, or bypassing a
10 > > slower native python implementation.
11 >
12 > It currently does the blob generation in one separate process in
13 > parallel with the activities that are happening in the main cvs2svn
14 > process. (I remembered incorrectly; the blobs are collected during
15 > FilterSymbolsPass, not CollectRevsPass.)
16 >
17 > I think it would be relatively straightforward to generalize this to N
18 > blob-generation processes, but I think that blob generation had already
19 > ceased to be the bottleneck with only 1 process. However, your
20 > conversion is unusual (e.g., linear history, lots of RAM) so it could be
21 > that you will hit different limits than my tests did.
22 >
23 > Robin definitely had some version of the ExternalBlobGenerator code back
24 > in 2009.
25
26 Yeah, I think that patch is what I'm remembering for divergence.
27
28
29 > > If memory servers, where we hit major issues was in dealing w/
30 > > encoding- crap like names, some bad latin-1 sort of data. Will have
31 > > to re-run this and check.
32 >
33 > This could be fixed in cvs2git with a reasonable amount of effort. I
34 > already did a lot of restructuring to allow EOL handling and keyword
35 > expansion options to be specified via properties (see
36 > doc/properties.txt) and to make the properties available earlier in the
37 > conversion. I think the main work that remains is to pass the
38 > properties to the generate_blobs.py script and to teach the script to
39 > massage the revision contents before writing them to the blobfile.
40 >
41 > > Re: pgsql, good to here- they don't have our whacked tree structure,
42 > > but I'm sue the encoding sort of issues we had, they probably had a
43 > > couple of.
44 > >
45 > > As for .cvsignore, at least for gentoo-x86 (repo in discussion), we've
46 > > got none of that in use- so non issue, but for our other repos, yeah,
47 > > it'll come up. Manually translating and stacking a commit on top of
48 > > the results is perfectly acceptable (your tools already cover 99%
49 > > after all).
50 >
51 > I think .gitignore files would be pretty easy, too. It's another one of
52 > those things that I never got around to before being distracted by other
53 > projects (I hack mostly on git itself now).
54 >
55 > >> More details and/or bug reports would be much appreciated if the
56 > >> problems still exist in the 2.4.0 release (which is just out, though it
57 > >> is well-used code with hardly any recent changes).
58 > >
59 > > Honestly, I couldn't give you the details; that code I started from in
60 > > '10 was inherited; it was already a year old branching/release, w/
61 > > local modifications before I got started (I do recall verifying that
62 > > the modifications were necessary, else things failed miserably).
63 >
64 > Is the code that you were using visible somewhere?
65
66 Robin would know, but my suspicion is 'no'; your description of an
67 early blob generator sounds right.
68
69 Offhand, I can tell the optimizations taht were done beyond that I
70 upstreamed (or you/I agreed to leave them out); sole exception was
71 some snakeoil inlining/usage, but that wasn't a massive gain (5% or
72 so).
73
74 I'm offline till sunday/monday, but I'll fire off a run during then-
75 barring it failing, will report stats when I get back from vacation.
76
77 ~brian