1 |
FYI - I started a repository of my git validation work at: |
2 |
git://github.com/rich0/gitvalidate.git |
3 |
|
4 |
I'm starting on the git side first. I'm taking all my data directly |
5 |
from the git executables and plan to do the same for cvs - if they |
6 |
output the same content we should be OK. I did some testing and I |
7 |
think that my code should handle unicode output if git generates it. |
8 |
|
9 |
The git repository has 1259922 commits, and it takes 50.5 seconds to |
10 |
walk the list of commits to produce of trees and their commit info. |
11 |
|
12 |
Next step is to iteratively perform the map / reduce algorithm I |
13 |
outlined earlier to get a per-file history similar to what cvs |
14 |
captures. |
15 |
|
16 |
Contributions welcome. I'm finding the main issue is cutting down the |
17 |
overhead of spawning git processes to do the work. While it will make |
18 |
for more work in theory I might just have git-ls-tree recurse the |
19 |
trees to reduce the subprocess overhead and then just do the extra |
20 |
sorting/de-duplication in python. I'm trying to avoid using git |
21 |
implementations in python since that might expose us to bugs. |
22 |
|
23 |
Rich |