1 |
On Thu, Oct 04, 2012 at 11:34:00PM +0200, Michael Haggerty wrote: |
2 |
> On 10/04/2012 09:38 PM, Brian Harring wrote: |
3 |
> > On Thu, Oct 04, 2012 at 05:45:40PM +0200, Michael Haggerty wrote: |
4 |
> >> Another big boost for cvs2git is to use the ExternalBlobGenerator, which |
5 |
> >> extracts revision contents to a "blob" file in CollectRevsPass using an |
6 |
> >> external script, thereby making OutputPass much faster. |
7 |
> > |
8 |
> > This still sequential, or can it effectively run in parallel? I'm |
9 |
> > just wondering if the 'faster' is via parallelism, or bypassing a |
10 |
> > slower native python implementation. |
11 |
> |
12 |
> It currently does the blob generation in one separate process in |
13 |
> parallel with the activities that are happening in the main cvs2svn |
14 |
> process. (I remembered incorrectly; the blobs are collected during |
15 |
> FilterSymbolsPass, not CollectRevsPass.) |
16 |
> |
17 |
> I think it would be relatively straightforward to generalize this to N |
18 |
> blob-generation processes, but I think that blob generation had already |
19 |
> ceased to be the bottleneck with only 1 process. However, your |
20 |
> conversion is unusual (e.g., linear history, lots of RAM) so it could be |
21 |
> that you will hit different limits than my tests did. |
22 |
> |
23 |
> Robin definitely had some version of the ExternalBlobGenerator code back |
24 |
> in 2009. |
25 |
|
26 |
Yeah, I think that patch is what I'm remembering for divergence. |
27 |
|
28 |
|
29 |
> > If memory servers, where we hit major issues was in dealing w/ |
30 |
> > encoding- crap like names, some bad latin-1 sort of data. Will have |
31 |
> > to re-run this and check. |
32 |
> |
33 |
> This could be fixed in cvs2git with a reasonable amount of effort. I |
34 |
> already did a lot of restructuring to allow EOL handling and keyword |
35 |
> expansion options to be specified via properties (see |
36 |
> doc/properties.txt) and to make the properties available earlier in the |
37 |
> conversion. I think the main work that remains is to pass the |
38 |
> properties to the generate_blobs.py script and to teach the script to |
39 |
> massage the revision contents before writing them to the blobfile. |
40 |
> |
41 |
> > Re: pgsql, good to here- they don't have our whacked tree structure, |
42 |
> > but I'm sue the encoding sort of issues we had, they probably had a |
43 |
> > couple of. |
44 |
> > |
45 |
> > As for .cvsignore, at least for gentoo-x86 (repo in discussion), we've |
46 |
> > got none of that in use- so non issue, but for our other repos, yeah, |
47 |
> > it'll come up. Manually translating and stacking a commit on top of |
48 |
> > the results is perfectly acceptable (your tools already cover 99% |
49 |
> > after all). |
50 |
> |
51 |
> I think .gitignore files would be pretty easy, too. It's another one of |
52 |
> those things that I never got around to before being distracted by other |
53 |
> projects (I hack mostly on git itself now). |
54 |
> |
55 |
> >> More details and/or bug reports would be much appreciated if the |
56 |
> >> problems still exist in the 2.4.0 release (which is just out, though it |
57 |
> >> is well-used code with hardly any recent changes). |
58 |
> > |
59 |
> > Honestly, I couldn't give you the details; that code I started from in |
60 |
> > '10 was inherited; it was already a year old branching/release, w/ |
61 |
> > local modifications before I got started (I do recall verifying that |
62 |
> > the modifications were necessary, else things failed miserably). |
63 |
> |
64 |
> Is the code that you were using visible somewhere? |
65 |
|
66 |
Robin would know, but my suspicion is 'no'; your description of an |
67 |
early blob generator sounds right. |
68 |
|
69 |
Offhand, I can tell the optimizations taht were done beyond that I |
70 |
upstreamed (or you/I agreed to leave them out); sole exception was |
71 |
some snakeoil inlining/usage, but that wasn't a massive gain (5% or |
72 |
so). |
73 |
|
74 |
I'm offline till sunday/monday, but I'll fire off a run during then- |
75 |
barring it failing, will report stats when I get back from vacation. |
76 |
|
77 |
~brian |