1 |
On Mon, Feb 22, 2016 at 4:49 PM, James <wireless@×××××××××××.com> wrote: |
2 |
> Rich Freeman <rich0 <at> gentoo.org> writes: |
3 |
> |
4 |
>> If I were doing anything too |
5 |
>> crazy with all this I'd probably use the python git module. |
6 |
> |
7 |
> dev-python/git-python ??? Any others or related docs/howtos/examples? |
8 |
> |
9 |
|
10 |
I used pygit2, but there are a few different implenentations and |
11 |
plenty of docs online in general. |
12 |
|
13 |
Here is an example program that runs through a history and dumps a |
14 |
list of commits and their metadata in csv format: |
15 |
https://github.com/rich0/gitvalidate/blob/master/gitdump/parsetrees.py |
16 |
|
17 |
There are some other scripts that retrieve blobs and manipulate them |
18 |
in the same directory. This was part of the validation of the git |
19 |
migration, which uses a map-reduce algorithm to diff every single |
20 |
commit in a git history and identify all file revisions (which creates |
21 |
a cvs-like per-file history which can then be compared with results |
22 |
obtained from parsing a cvs repository for the same information). The |
23 |
only single-threaded step in the process is walking the list of |
24 |
commits - all the diffs can be highly paralleled. |
25 |
|
26 |
I doubt you need anything quite so fancy. As you can see from the |
27 |
script pulling metadata out of commits and walking through parents is |
28 |
pretty easy. |
29 |
|
30 |
My example doesn't account for merge commits. There weren't any in |
31 |
the cvs->git migration. Obviously walking commits with merges will |
32 |
get a lot messier. |
33 |
|
34 |
-- |
35 |
Rich |