Gentoo Archives: gentoo-scm

From: Brian Harring <ferringb@×××××.com>
To: gentoo-scm@l.g.o
Subject: Re: [gentoo-scm] Bizarre Python Issue with Validation
Date: Mon, 15 Oct 2012 00:07:58
Message-Id: 20121015000803.GC4437@localhost.corp.google.com
In Reply to: [gentoo-scm] Bizarre Python Issue with Validation by Rich Freeman
1 On Sat, Oct 13, 2012 at 06:12:14PM -0400, Rich Freeman wrote:
2 > In my validation script I'm running into a bizarre issue.
3 >
4 > This script fails if run under dumbo:
5 > https://github.com/rich0/gitvalidate/raw/master/gitdump/iteratetree.py
6 >
7 > The dumbo command line was:
8 > dumbo start iteratetree.py -input /outfile.csv -output /out
9 >
10 > Outfile.csv just had 3 entries in it for testing (generated by
11 > parsetrees.py in that directory).
12 >
13 > It failed with a KeyError on the line:
14 > tree=repository[unicode(filehash)];
15 >
16 > The KeyError error message contained the correct hash.
17 >
18 > To eliminate variables I confirmed that replacing that line with this
19 > one generates the same error message:
20 > tree=Repository("/data/git/gentoo-x86")[unicode('5789de7076bfc534787fb3b072652810ca1ad197')]
21 >
22 > (all I did was hard-code everything for one entry to cut out all the
23 > other parsing logic).
24 >
25 > However, the following script works just fine:
26 > #!/usr/bin/python
27 >
28 > from pygit2 import Repository,GIT_OBJ_TREE;
29 > tree=Repository("/data/git/gentoo-x86")[unicode('5789de7076bfc534787fb3b072652810ca1ad197')]
30 > print tree
31 >
32 > Those are the exact same statements from the failing script, with the
33 > same hard-coded values. Running those commands interactively works
34 > fine also, as does stepping through the script logic.
35 >
36 > The main difference is how the execution flow reaches that point. In
37 > the simple script it is just outright run. In the case of the failing
38 > one it is in an iterator that is invoked by dumbo.
39 >
40 > Has anybody seen this kind of behavior in python - where an interator
41 > fails if it is called as part of a map statement but the same
42 > statements otherwise work normally? Perhaps I'm committing some
43 > obscure (to me) python sin.
44
45 Probably not the answer you want... but I suggest you valgrind it. If
46 the behaviour is differing for Repository()[some-hash], that makes me
47 think there is a ref counting bug afoot.
48
49 Also, wtf is dumbo? And can you clarify how this is doing it's
50 validation, rough runtime, etc? Minimally, it's going to need to go
51 parallel- I rebuilt the conversion bits (parallelized along category
52 lines, then reintegrating after the fact), and it's around 50m for
53 run- validation being equivalent would definitely be preferable.
54
55 @mhagger; details of that I'll share in a bit- roughly, it's
56 exploiting the fact gentoo-x86 /never/ has cross category commits (no
57 one does repo wide commits, although detection/merging of that I'm
58 checking for).
59
60 Cheers-
61 ~harring

Replies

Subject Author
Re: [gentoo-scm] Bizarre Python Issue with Validation Rich Freeman <rich0@g.o>