Gentoo Archives: gentoo-scm

From:	Rich Freeman <rich0@g.o>
To:	gentoo-scm@l.g.o
Subject:	Re: [gentoo-scm] cvs/irker/git thread from -dev
Date:	Tue, 02 Oct 2012 02:09:44
Message-Id:	`CAGfcS_mtnu+Mb9-CxUq=sr7p9p=-3GqCkzzZLOmOThPyckEo_w@mail.gmail.com`
In Reply to:	Re: [gentoo-scm] cvs/irker/git thread from -dev by Michael Mol

1	On Mon, Oct 1, 2012 at 9:46 PM, Michael Mol <mikemol@×××××.com> wrote:
2	> This was on -dev a while ago. Was there nothing additional?
3
4	Not yet - there was a bunch of good conversation on irc over the last
5	few hours.
6
7	Note that this list is archived on gmane, and it is probably worth
8	reviewing for past discussions.
9
10	>
11	> Diego had a fair point about having to checkout a crapton of git
12	> history. I think that could be significantly improved with a seed
13	> tarball on a torrent, particularly if the seed tarball is updated
14	> annually (or biannually).
15
16	Initial distribution will be via a tarball of some kind (not sure if
17	it is just a tarball of the repository, or in an importable git
18	format) to save on load. It would be available with or without
19	history. I'm not actually certain that you can't commit without the
20	full history. I would think that as long as git could fast-forward
21	from whatever gentoo considers the head to whatever you consider the
22	head you'd be fine.
23
24	From IRC the big issues at the moment are that we know the current
25	migration drops some old content from the history. It is also hard to
26	validate the content.
27
28	Validation is tricky because CVS tends to store per-file history, and
29	git tends to store a history of the head which incorporates all the
30	files within. So, traversing each in a logical order makes it really
31	hard to compare them. However, this really seems like a good problem
32	for something like MapReduce.
33
34	If you traverse git for each commit you should be able to output:
35	path/file hash date/time committer message
36
37	Then you'd sort that by the first three fields, and drop all lines
38	where the first two fields don't change. That then gives you a sorted
39	commit history per file. Then you could traverse cvs and get the same
40	(the output of cvs log per-file should do that). Then you compare the
41	two lists.
42
43	You could do the dumps for all the git commits in parallel, and
44	sorting is parallel as well (basically automatic with mapreduce I
45	think). For CVS you should be able to do each file in parallel, and
46	again the sorting is parallel.
47
48	Note I have zero experience with MapReduce. However, it seems like a
49	really fun project. I need to learn more about the "logical" model
50	behind CVS. I understand it fairly well for git and that is a thing
51	of beauty. You don't even need to calculate the hashes for git since
52	they're already stored in the tree.
53
54	Rich

Report Message

Find on MARC Find on Google Groups