Gentoo Archives: gentoo-scm

From: Brian Harring <ferringb@×××××.com>
To: gentoo-scm@l.g.o
Cc: gentoo-dev@l.g.o
Subject: [gentoo-scm] CVS -> git, list of where non-infra folk can contribute
Date: Tue, 02 Oct 2012 04:15:36
Message-Id: 20121002041523.GB9562@localhost
1 Cross-posting to scm; responses should go to scm please (and the
2 people who whinge about cross posting should go promptly to hell if
3 I have any say in the matter).
4
5 On Mon, Oct 01, 2012 at 05:58:43PM -0700, Diego Elio Petten?? wrote:
6 > On 01/10/2012 17:51, Gregory M. Turner wrote:
7 > >
8 > > Anyhow, I get it: administering the vcs for a huge project such as
9 > > Gentoo is very hard work. If I somehow gave some other impression, I'm
10 > > sorry. Perhaps Rich and I insensitively voiced our shared assumption
11 > > that Gentoo's continued reliance on cvs stems from a lack of motivation
12 > > and consensus, rather than a shortage of labor and resources.
13 >
14 > That's definitely not the case. While we do have had some complains
15 > (mostly from Prefix last I knew) about git's working, the consensus for
16 > going to git is there. The problems are vastly technical.
17 >
18 > Problems such as "how many developers would be fine with having to
19 > checkout 2GB of history to be able to commit"? git support shallow
20 > clones but not if you want to commit to them.
21
22 Few corrections;
23 1) You can commit to shallow clones. You can actually push from them
24 too- you just have to know what you're doing (your parent *has* to be
25 known to the other side, else you're trying to push a disconnected
26 history/graph to the other side, which doesn't know how to connect the
27 two). We won't be doing that fortunately, just noting that it is
28 possible if you're careful (and I know what the man page says; what
29 I'm saying is the full version, rather than the short version they
30 list there).
31
32 2) graft's are what we'll be doing there; kind of shallow, but now.
33 Basically the same thing the kernel folk did.
34
35
36 As for the "quit your bitching and contribute already" rant angle;
37 Diego's accurate; minimally, it's more productive to contribute and
38 you're less likely to crap on folks motivation, let alone risk the
39 wraith of a pissy person like me yelling at you.
40
41 Here in is the kicker; certain chunks of this can't be handled by
42 random joe blow off the street- they require core infra access.
43
44 Bluntly (no disrespect to people, just being brutally direct) I don't
45 care if you have infra friends, I don't care if you maintain a couple
46 of boxes; if you're doing heavy OPs in a production environment,
47 you'll understand the issue of trust/access- thus you'll understand
48 that some of this work, cannot be done by anyone but infra.
49
50 Like it or not, very few people have access to the core cvs -> rsync
51 hosts/machinery- since each/every/one/of/us means it's a security
52 angle that has to be tracked. That's not arguable, so don't even try
53 please.
54
55 That said, there are non-infra contributions people can make.
56
57 I suggest people do that; here's the list off the top of my head
58 (these are things worst case, I'll sort- which means it'll be months
59 out till I finish them considering my own time constraints and focus
60 on getting eapi5 support into pkgcore first).
61
62 0) First the rules of the road for this discussion; assume that I'll
63 be bitchy if you violate this.
64
65 0.a) We're not dropping the existing history. Suggesting this is
66 asking for a killfile entry, it's viable for small or throw-away
67 projects; gentoo-x86 cvs repository is not a throw-away project.
68
69 0.b) Lesser offence since it's not obvious; the various suggestions
70 that we just snapshot this, then try to fix history after the fact
71 won't work- look into git's transitive trust via sha1's of the
72 parent's sha1. To do that sort of proposal means forcing a full
73 history rewrite down the line; this doesn't fly.
74
75 0.c) For whatever I've missed, assume that if it craps on developers
76 workflow... it's a no go, and needs to be addressed. Does CVS suck?
77 Yes, I hate having to use it. But it *works*; switching to git has to
78 be, minimally, a lateral move for developers in terms of their
79 workflow- we cannot make it worse else what's the point of this whole
80 exercise? There may be an exception or two here- things that aren't
81 sorted immediately upon conversion, but those exceptions will only fly
82 if they're minor, don't require history rewrites, and someone is
83 locked in/guranteed to be working on it now (else we have no gurantee
84 it'll actually be sorted).
85
86
87 1) We need a thin manifest -> thick manifest converter. Thin
88 manifests are used for git- they store just DIST entries. Thick (also
89 known as 'full'), are what cvs/rsync users are familiar with- it holds
90 checksums for all content.
91
92 1.a) This converter must use portage api's; ultimately, this
93 thin->thick conversion will be signed by an infra key (rather than the
94 current hodgepodge of devs). I suggest nesting it under the emaint
95 command.
96
97 1.b) This converter needs to be fast. $VCS -> rsync updates occur
98 every 30 minutes. thin/thick sorting should be sub minute, frankly;
99 go parallel (multiprocessing) being my suggestion, threadpool worst
100 case (since most of the work won't be gil bound).
101
102 1.c) This absolutely has to be fucking stable. This will be a core
103 part of our infrastructure after all.
104
105 1.d) I will kneecap the first person who whines about portage on this,
106 or suggests NIH "lets just hack it"- they won't have to support it,
107 this goes into portage so it's proper, and so infra isn't stuck w/
108 more custom code.
109
110 1.e) This actually isn't that hard. Ask in #gentoo-portage for
111 details, look at portage source, look at repoman's existing manifest
112 command- that manifest command already is the basics of it.
113
114 1.5) Incremental signing of a tree is basically required; meaning
115 whatever scanner there is, shouldn't require resigning every single
116 package, only those that have changed thick manifest wise.
117
118 1.6) Anyone looking to do this should pop into #gentoo-portage, talk
119 w/ a user named 'carebear', zmedico, etc; zmedico is portage's
120 maintainer, carebear is the current person volunteering to sort this
121 (help may be appreciated, talk to him/her/it).
122
123
124 2) Building off of #1, although *NOT REQUIRED FOR CVS->GIT MIGRATION*,
125 just very strongly desired, is sorting tree signing gleps while we're
126 at it. Start from http://www.gentoo.org/proj/en/glep/glep-0057.html ;
127 whatever solution #1 takes (likely an emaint command), tree signing
128 will be built right smack dab into it.
129
130
131 3) Robin afaik is putting together an email with the details; roughly,
132 the conversion process is conversion of cvs to svn, then svn2git
133 conversion; this is done since frankly it's the best/sanest conversion
134 pathway, and the fastest. The validation of that conversion, and
135 getting it down to basically a set of known invocations is required.
136
137 3.a) Roughly, the plan will be snag the tree, start conversion.
138 Validate the results, repeat as necessary till we're happy with it.
139 This is the initial git core history, This step should be <8h; mostly
140 cpu time, frankly, although re-validation of that pathway is required
141 (I did a fair amount of optimization to this, but I've not rechecked
142 the runtime in a while- nor if there is a better option in existence).
143 Basically, it's strongly preferable we're not sorting this at the time
144 we're trying to do the live conversion- the core issues need to be
145 sorted before.
146
147 3.b) Take all cvs activity that has occurred since the tree was
148 snapshotted and conversion started, and replay it into git via tailor;
149 this is minor- and avoidable if we just shut the tree down for however
150 long 3.a takes; that said, the tailor route is the intention, and
151 shouldn't be a problem.
152
153
154 4) People who strongly know git hooks would be useful; server side,
155 all incoming pushes from devs will have their commits validated before
156 touching the tree- bad validation, commit gets kicked back to them.
157 The hooks for this need doing (development of this can be done locally
158 w/out having to access infra either). Hell, someone may already have
159 done something similar- I've not seen it, but we need something akin
160 to this; whoever does this, needs to write it such that the auth
161 backend is configurable (upon deployment, this will be bound into
162 ldap, or an ldap scraped set of data that it'll consult); assume that
163 the auth backend will be user->gpg key level of validation (meaning I
164 cannot take a random commit antarus had against current ToT, and push
165 that on his behalf- robin may disagree on this point however).
166
167
168
169 Were that to be done, that would leave for infra basically the
170 following- which is most definitely not a complete list-
171
172 1) gitolite configuration/setup, which afaik is basically sorted.
173 2) cvs -> rsync pathways being rebuilt to be git -> rsync (reliant on
174 #1 from above, but there is more that occurs there).
175 3) Thanking people for stepping up and helping to take care of the
176 stuff we're seriously low on time to sort.
177
178 People don't step up, I'll be working my way through that list; that
179 said, my timetable were I to do this isn't "next week or the week
180 after"- it's "over the next few months as time allows".
181
182 Also, it's entirely possible I missed something for the non-infra
183 tasks people can contribute to; that's just a quick brain dump, pardon
184 any incorrect statements. If one has questions and answers aren't
185 coming through via the scm ml, then worst case track me down on
186 freenode via the ferringb nick; just assume I'll be wickedly laggy
187 in responding.
188
189 Finally, pardon the strong tones; the tone in use isn't meant to
190 dissuade people from contributing, it's meant to ensure people stay
191 focused on what's required here to get the job done- discussions about
192 building a git mirroring tier (for example) are for *after* the
193 initial work is done (understand that 99% of users will be using rsync
194 even when we switch dev's underlying vcs got git; longer term that may
195 change, but it's a v2 type thing, not a v1 type thing).
196
197 Cheers-
198 ~harring

Replies