Gentoo Archives: gentoo-scm

From: Rich Freeman <rich0@g.o>
To: gentoo-scm@l.g.o
Subject: [gentoo-scm] Overall Validation Approach
Date: Mon, 22 Oct 2012 14:09:10
Message-Id: CAGfcS_mmz_HAQWgxrxA4wFG6XVLtk3xw9nhuV=9g2ytfhmRmiA@mail.gmail.com
1 Two items in this email - one is a quick status update, but I think
2 the more important issue is stepping back to figure out why we're even
3 doing this...
4
5 Quick status update - I can parse git repos and cvs repos and come up
6 with csv files that describe each fairly well.
7 Issues remaining:
8 1. subdirs deleted in cvs don't show up in cvs log, and therefore not
9 in my list. That needs a move from cvs to rlog to fix. (small effort)
10 2. Files deleted in git don't show up. That needs a move from looking
11 at commits in isolation to doing pairwise comparisons. (significant
12 effort)
13 3. File hashes don't match, because the migration changes the headers.
14 (medium effort)
15 4. Authors don't match, because these are also transformed. (medium effort)
16 5. Timestamps need a fuzz factor due to Manifest commit squashing.
17 (small effort on comparision side)
18
19 Overall the general sense I'm getting is that the migration is working
20 fine. Identifying subtle issues will require addressing most of the
21 items above - otherwise making sense of all the differences is
22 difficult without manual inspection. What I have manually inspected
23 has turned out fine. What I can glean from overall results also looks
24 good (number of files per commit, etc).
25
26 This leads me to my question regarding approach. Just what is the
27 goal of validation, and why are we doing it?
28
29 With the number of transformations involved in the git migration it is
30 becoming apparent that the only way to really check it is essentially
31 to implement it twice independently and confirm they lead to the same
32 output. I can cut corners like just applying a fuzz factor to the
33 timestamps, but really this is turning into implementing the migration
34 twice.
35
36 As far as speed goes - all of this is coded in python so it isn't
37 optimal. Just about everything I'm doing can be run in parallel
38 (especially after switching to rlog), but it is going to consume an
39 hour or two most likely. For general testing of the migration process
40 I think that is adequate, but as a post-migration step it will
41 probably take longer than the migration itself (the cvs side can run
42 in parallel with migration at least).
43
44 I'm open to suggestions, but rather than fully re-inventing the wheel
45 I'm thinking that fixing issues #1 and #5 above might be as far as I
46 go with this. They're easy to fix, and #1 is resulting in huge gaps.
47 What that will tell us is that nothing is getting missed in the
48 migration.
49
50 Others are of course welcome to pitch in as well, but I still think
51 we're re-inventing the wheel. I'm trying to focus my efforts on doing
52 analysis that is likely to spot actual problems, and not just
53 re-running the same functions on the same data to get the same answer.
54
55 Code review of ferringb's work might be more productive in terms of
56 spotting problems. So might be publishing his bundles and letting
57 people spot check their favorite packages.
58
59 If we were doing this at work we'd probably spot check data with
60 formal comparison scripts (involving human comparison), and then
61 preserve a copy of the cvs repo for its retention period just in case.
62
63 What are the general thoughts here? I don't want to hold up moving
64 forward with the migration to continuously refine a second
65 implementation of something that is already implemented.
66
67 Rich

Replies

Subject Author
Re: [gentoo-scm] Overall Validation Approach "Robin H. Johnson" <robbat2@g.o>
Re: [gentoo-scm] Overall Validation Approach Peter Stuge <peter@×××××.se>