Gentoo Archives: gentoo-scm

From: Rich Freeman <rich0@g.o>
To: gentoo-scm@l.g.o
Subject: Re: [gentoo-scm] Re: Git Validation - Update
Date: Wed, 17 Oct 2012 13:29:25
Message-Id: CAGfcS_=8AV_vg2p-z8b2E+z3ExvpwnvtZwoYQ=dEDaOj-nh-cQ@mail.gmail.com
In Reply to: Re: [gentoo-scm] Re: Git Validation - Update by Peter Stuge
1 On Wed, Oct 17, 2012 at 8:49 AM, Peter Stuge <peter@×××××.se> wrote:
2 > Rich Freeman wrote:
3 >> I'm storing a history of when files change, starting by looking at
4 >> commits in complete isolation.
5 >
6 > The more gitty way would be to use the trees.
7
8 I was being a bit informal in my description. The actual map/reduce
9 steps don't look at commits at all - only trees/blobs. The initial
10 parse of the commits extracts all the trees and the commit info we
11 care about. There is no way to get to blobs from commits except
12 through trees.
13
14 >
15 > It would be two passes. First pass tree[n%2=0] with tree[n+1] and on
16 > second pass tree[n%2=1] with tree[n+1].
17
18 Sure, and you can break that down much further. If you write each
19 commit with the one preceeding it on the same line you can even do it
20 with map. It is a big change in the algorithm. Actually, by doing it
21 that way you could just do a complete pairwise compare of the whole
22 tree and glean everything on a single iteration. I think. To do it
23 I'd just write each line of my csv with twice as many fields - the
24 second half of each line being the first half of the next, or the one
25 before, or however it works. First or last would be a special case.
26
27 >> One way that comes to mind is to do a second pass that just looks
28 >> for deletions, using the cvs data to cheat - each deletion could be
29 >> checked in parallel doing a pairwise compare on a single file.
30 >
31 > It would compare trees, not files, but sure, that works too. It will
32 > not notice if something extra has been deleted in git however, if the
33 > same file does not have any further changes in CVS.
34
35 Sort-of. If a file is deleted early then it will show up as being
36 missed when the file should have been deleted. Files that never were
37 deleted but which were deleted in git are very easy to detect - a
38 simple file compare of a checkout of each would spot that.
39
40 Rich