1 |
On Sat, Oct 13, 2012 at 06:12:14PM -0400, Rich Freeman wrote: |
2 |
> In my validation script I'm running into a bizarre issue. |
3 |
> |
4 |
> This script fails if run under dumbo: |
5 |
> https://github.com/rich0/gitvalidate/raw/master/gitdump/iteratetree.py |
6 |
> |
7 |
> The dumbo command line was: |
8 |
> dumbo start iteratetree.py -input /outfile.csv -output /out |
9 |
> |
10 |
> Outfile.csv just had 3 entries in it for testing (generated by |
11 |
> parsetrees.py in that directory). |
12 |
> |
13 |
> It failed with a KeyError on the line: |
14 |
> tree=repository[unicode(filehash)]; |
15 |
> |
16 |
> The KeyError error message contained the correct hash. |
17 |
> |
18 |
> To eliminate variables I confirmed that replacing that line with this |
19 |
> one generates the same error message: |
20 |
> tree=Repository("/data/git/gentoo-x86")[unicode('5789de7076bfc534787fb3b072652810ca1ad197')] |
21 |
> |
22 |
> (all I did was hard-code everything for one entry to cut out all the |
23 |
> other parsing logic). |
24 |
> |
25 |
> However, the following script works just fine: |
26 |
> #!/usr/bin/python |
27 |
> |
28 |
> from pygit2 import Repository,GIT_OBJ_TREE; |
29 |
> tree=Repository("/data/git/gentoo-x86")[unicode('5789de7076bfc534787fb3b072652810ca1ad197')] |
30 |
> print tree |
31 |
> |
32 |
> Those are the exact same statements from the failing script, with the |
33 |
> same hard-coded values. Running those commands interactively works |
34 |
> fine also, as does stepping through the script logic. |
35 |
> |
36 |
> The main difference is how the execution flow reaches that point. In |
37 |
> the simple script it is just outright run. In the case of the failing |
38 |
> one it is in an iterator that is invoked by dumbo. |
39 |
> |
40 |
> Has anybody seen this kind of behavior in python - where an interator |
41 |
> fails if it is called as part of a map statement but the same |
42 |
> statements otherwise work normally? Perhaps I'm committing some |
43 |
> obscure (to me) python sin. |
44 |
|
45 |
Probably not the answer you want... but I suggest you valgrind it. If |
46 |
the behaviour is differing for Repository()[some-hash], that makes me |
47 |
think there is a ref counting bug afoot. |
48 |
|
49 |
Also, wtf is dumbo? And can you clarify how this is doing it's |
50 |
validation, rough runtime, etc? Minimally, it's going to need to go |
51 |
parallel- I rebuilt the conversion bits (parallelized along category |
52 |
lines, then reintegrating after the fact), and it's around 50m for |
53 |
run- validation being equivalent would definitely be preferable. |
54 |
|
55 |
@mhagger; details of that I'll share in a bit- roughly, it's |
56 |
exploiting the fact gentoo-x86 /never/ has cross category commits (no |
57 |
one does repo wide commits, although detection/merging of that I'm |
58 |
checking for). |
59 |
|
60 |
Cheers- |
61 |
~harring |