1 |
On 10/04/2012 09:38 PM, Brian Harring wrote: |
2 |
> On Thu, Oct 04, 2012 at 05:45:40PM +0200, Michael Haggerty wrote: |
3 |
>> Another big boost for cvs2git is to use the ExternalBlobGenerator, which |
4 |
>> extracts revision contents to a "blob" file in CollectRevsPass using an |
5 |
>> external script, thereby making OutputPass much faster. |
6 |
> |
7 |
> This still sequential, or can it effectively run in parallel? I'm |
8 |
> just wondering if the 'faster' is via parallelism, or bypassing a |
9 |
> slower native python implementation. |
10 |
|
11 |
It currently does the blob generation in one separate process in |
12 |
parallel with the activities that are happening in the main cvs2svn |
13 |
process. (I remembered incorrectly; the blobs are collected during |
14 |
FilterSymbolsPass, not CollectRevsPass.) |
15 |
|
16 |
I think it would be relatively straightforward to generalize this to N |
17 |
blob-generation processes, but I think that blob generation had already |
18 |
ceased to be the bottleneck with only 1 process. However, your |
19 |
conversion is unusual (e.g., linear history, lots of RAM) so it could be |
20 |
that you will hit different limits than my tests did. |
21 |
|
22 |
Robin definitely had some version of the ExternalBlobGenerator code back |
23 |
in 2009. |
24 |
|
25 |
> If memory servers, where we hit major issues was in dealing w/ |
26 |
> encoding- crap like names, some bad latin-1 sort of data. Will have |
27 |
> to re-run this and check. |
28 |
|
29 |
This could be fixed in cvs2git with a reasonable amount of effort. I |
30 |
already did a lot of restructuring to allow EOL handling and keyword |
31 |
expansion options to be specified via properties (see |
32 |
doc/properties.txt) and to make the properties available earlier in the |
33 |
conversion. I think the main work that remains is to pass the |
34 |
properties to the generate_blobs.py script and to teach the script to |
35 |
massage the revision contents before writing them to the blobfile. |
36 |
|
37 |
> Re: pgsql, good to here- they don't have our whacked tree structure, |
38 |
> but I'm sue the encoding sort of issues we had, they probably had a |
39 |
> couple of. |
40 |
> |
41 |
> As for .cvsignore, at least for gentoo-x86 (repo in discussion), we've |
42 |
> got none of that in use- so non issue, but for our other repos, yeah, |
43 |
> it'll come up. Manually translating and stacking a commit on top of |
44 |
> the results is perfectly acceptable (your tools already cover 99% |
45 |
> after all). |
46 |
|
47 |
I think .gitignore files would be pretty easy, too. It's another one of |
48 |
those things that I never got around to before being distracted by other |
49 |
projects (I hack mostly on git itself now). |
50 |
|
51 |
>> More details and/or bug reports would be much appreciated if the |
52 |
>> problems still exist in the 2.4.0 release (which is just out, though it |
53 |
>> is well-used code with hardly any recent changes). |
54 |
> |
55 |
> Honestly, I couldn't give you the details; that code I started from in |
56 |
> '10 was inherited; it was already a year old branching/release, w/ |
57 |
> local modifications before I got started (I do recall verifying that |
58 |
> the modifications were necessary, else things failed miserably). |
59 |
|
60 |
Is the code that you were using visible somewhere? |
61 |
|
62 |
Michael |
63 |
|
64 |
-- |
65 |
Michael Haggerty |
66 |
mhagger@××××××××.edu |
67 |
http://softwareswirl.blogspot.com/ |