Gentoo Archives: gentoo-scm

From:	Rich Freeman <rich0@g.o>
To:	gentoo-scm@l.g.o
Subject:	[gentoo-scm] Overall Validation Approach
Date:	Mon, 22 Oct 2012 14:09:10
Message-Id:	`CAGfcS_mmz_HAQWgxrxA4wFG6XVLtk3xw9nhuV=9g2ytfhmRmiA@mail.gmail.com`

1	Two items in this email - one is a quick status update, but I think
2	the more important issue is stepping back to figure out why we're even
3	doing this...
4
5	Quick status update - I can parse git repos and cvs repos and come up
6	with csv files that describe each fairly well.
7	Issues remaining:
8	1. subdirs deleted in cvs don't show up in cvs log, and therefore not
9	in my list. That needs a move from cvs to rlog to fix. (small effort)
10	2. Files deleted in git don't show up. That needs a move from looking
11	at commits in isolation to doing pairwise comparisons. (significant
12	effort)
13	3. File hashes don't match, because the migration changes the headers.
14	(medium effort)
15	4. Authors don't match, because these are also transformed. (medium effort)
16	5. Timestamps need a fuzz factor due to Manifest commit squashing.
17	(small effort on comparision side)
18
19	Overall the general sense I'm getting is that the migration is working
20	fine. Identifying subtle issues will require addressing most of the
21	items above - otherwise making sense of all the differences is
22	difficult without manual inspection. What I have manually inspected
23	has turned out fine. What I can glean from overall results also looks
24	good (number of files per commit, etc).
25
26	This leads me to my question regarding approach. Just what is the
27	goal of validation, and why are we doing it?
28
29	With the number of transformations involved in the git migration it is
30	becoming apparent that the only way to really check it is essentially
31	to implement it twice independently and confirm they lead to the same
32	output. I can cut corners like just applying a fuzz factor to the
33	timestamps, but really this is turning into implementing the migration
34	twice.
35
36	As far as speed goes - all of this is coded in python so it isn't
37	optimal. Just about everything I'm doing can be run in parallel
38	(especially after switching to rlog), but it is going to consume an
39	hour or two most likely. For general testing of the migration process
40	I think that is adequate, but as a post-migration step it will
41	probably take longer than the migration itself (the cvs side can run
42	in parallel with migration at least).
43
44	I'm open to suggestions, but rather than fully re-inventing the wheel
45	I'm thinking that fixing issues #1 and #5 above might be as far as I
46	go with this. They're easy to fix, and #1 is resulting in huge gaps.
47	What that will tell us is that nothing is getting missed in the
48	migration.
49
50	Others are of course welcome to pitch in as well, but I still think
51	we're re-inventing the wheel. I'm trying to focus my efforts on doing
52	analysis that is likely to spot actual problems, and not just
53	re-running the same functions on the same data to get the same answer.
54
55	Code review of ferringb's work might be more productive in terms of
56	spotting problems. So might be publishing his bundles and letting
57	people spot check their favorite packages.
58
59	If we were doing this at work we'd probably spot check data with
60	formal comparison scripts (involving human comparison), and then
61	preserve a copy of the cvs repo for its retention period just in case.
62
63	What are the general thoughts here? I don't want to hold up moving
64	forward with the migration to continuously refine a second
65	implementation of something that is already implemented.
66
67	Rich

Replies

Subject	Author
Re: [gentoo-scm] Overall Validation Approach	"Robin H. Johnson" <robbat2@g.o>
Re: [gentoo-scm] Overall Validation Approach	Peter Stuge <peter@×××××.se>

Report Message

Find on MARC Find on Google Groups