Gentoo Archives: gentoo-project

From: Kent Fredric <kentnl@g.o>
To: gentoo-project@l.g.o
Subject: Re: [gentoo-project] Call for agenda items - Council meeting 2016-08-14
Date: Tue, 09 Aug 2016 05:33:39
Message-Id: 20160809173255.0ddfa090@katipo2.lan
In Reply to: Re: [gentoo-project] Call for agenda items - Council meeting 2016-08-14 by Jack Morgan
1 On Mon, 8 Aug 2016 19:07:04 -0700
2 Jack Morgan <jmorgan@g.o> wrote:
3
4 > On 08/08/16 05:35, Marek Szuba wrote:
5 > >
6 > > Bottom line: I would say we do need some way of streamlining ebuild
7 > > stabilisation.
8 >
9 > I vote we fix this problem. I'm tired of having this same discussion
10 > ever 6 or 12 months. I'd like to see less policy discussion and more
11 > technical solutions to the problems we face.
12 >
13 > I propose calling for volunteers to create a new project that works on
14 > solving our stabilization problem. I see that looking like the
15 > following:
16 >
17 > 1) project identifies the problem(s) with real data from Bugzilla and
18 > the portage tree.
19 >
20 > 2) new project defines a technical proposal to fixing this issue, then
21 > presents it to the developer community for feedback. This would
22 > include defining tools needed or used
23 >
24 > 3) start working on solution + define future roadmap
25 >
26 >
27 > All processes and policies should be on the table for negotiating in
28 > the potential solution. If we need to reinvent the wheel, then let's
29 > do it.
30 >
31 > To be honest, adding more policy just ends up making everyone unhappy
32 > one way or the other.
33 >
34 >
35
36 There's a potential way to garner a technical solution that somewhat
37 alleviates the need for such rigourous arch testers, and without
38 degrading the stabilisation mechanic to "blind monkey system that
39 stabilises based on conjecture".
40
41 I've mentioned it before ages ago on the Gentoo Dev list, somewhere.
42
43 The idea is basically to instrument portage to have an (optional)
44 feature that when turned on, records and submits certain facts about
45 every failed or successful install, with the objective being to
46 essentially spread the load out of what `tatt` does organically over
47 the participant base.
48
49 1. Firstly, make no demands of homoegenity or even sanity for a users
50 system to participate. Ever thing they throw at this system I'm about
51 to propose should be considered "valid"
52
53 2. Every time a package is installed, or attempted to be installed, the
54 exit of that installation is qualified in one of a number of ways:
55
56 - installed OK without tests
57 - installed OK with tests
58 - failed tests
59 - failed install
60 - failed compile
61 - failed configure
62
63 Each of these is a single state in a single field.
64
65 3. The Name, Version, and SHA1 of the ebuild that generated the report.
66
67
68 4. The USE flags and any other pertinent ( and carefully selected by
69 Gentoo ) flags are included, each as single fields in a property set,
70 and decomposed into structured property lists where possible.
71
72 5. <arch> satisfaction data for the target package at the time of
73 installation is recorded.
74
75 eg:
76
77 KEYWORDS="arch" + ACCEPT_KEYWORDS="~arch" -> [ "arch(~)" ]
78 KEYWORDS="~arch" + ACCEPT_KEYWORDS="~arch" -> [ "~arch(~)" ]
79 KEYWORDS="arch" + ACCEPT_KEYWORDS="arch" -> [ "arch" ]
80 KEYWORDS="" + ACCEPT_KEYWORDS="**" -> [ "(**)" ]
81
82 This seems redundant, but this is basically suggesting "hey, if you're
83 insane and setting lots of different arches for accept keywords, that
84 would be relevant data to use to ignore your report. This data can also
85 be used with other data I'll mention later to isolate users with "mixed
86 keywording" setups.
87
88 6. For every dependency listed in *DEPEND, a dictionary/hash of
89
90 "specified atom" -> {
91 name -> resolved dependency name
92 version -> version of resolved dependency
93 arch -> [ satisfied arch spec as in #4 ]
94 sha1 -> Some kind of SHA1 that hopefully turns up in gentoo.git
95 }
96
97
98 is recorded in the response at the time of the result.
99
100 The "satisified arch spec" field is used to isolate anomalies in
101 keywording and user keyword mixing and filter out non-target reports
102 for stabilization data.
103
104 7. A Submitter Unique Identifier
105
106 8. Possibly a Submitter-Machine Unique Identifier.
107
108 9. The whole build log will be included compressed, verbatim.
109
110 This latter part will an independent option to the "reporting" feature,
111 because its a slightly more invasive privacy concern than the others,
112 in that, arbitrary code execution can leak private data.
113
114 Hence, people who turn this feature on have to know what they're
115 signing up for.
116
117 10. All of the above data is pooled and shipped as a single report, and
118 submitted to a "report server" and aggregated.
119
120
121 With all of the above, in the most native of situations, we can use
122 that data at very least to give us a lot more assurance than "well, 30
123 days passed, and nobody complained", because we'll have a paper trail
124 of a known countable number of successful installs, which while
125 not representative, are likely to still be more diverse and
126 reassuring of confidence than the deafening silence of no
127 feedback.
128
129 And in non-naive situations, the results for given versions can be
130 aggregated and compared, and factors that are present can be correlated
131 with failures statistically.
132
133 And this would give us a status board of "here's a bunch of
134 configurations that seem to be statisically more problematic than
135 others, might be worth investigating"
136
137 But there would be no burden to actually dive into the logs unless you
138 found clusters of failures from different sources failing under the
139 same scenarios ( And this is why not everyone *has* to send build logs
140 to be effective, just enough people have to report "x configuration
141 bad" and some subset of them have to provide elucidating logs ).
142
143 None of what I mention here is conceptually "new", I've just
144 re-explained the entire CPAN Testers model in terms relevant to Gentoo,
145 using Gentoo parts instead of CPAN parts.
146
147 And CPAN testers find it *very effective* at being assured they didn't
148 break anything: They ship a TRIAL release ( akin to our ~arch ), and
149 then wait a week or so while people download and test it.
150
151 And pretty much anyone can become "a tester", there's no barrier to
152 entry, and no requirements for membership. Just install the tools, get
153 yourself an ID, and start installing stuff with tests (the default),
154 and the tools you have will automatically fire off those reports to the
155 hive, and you get a big pretty matrix of "We're good here", and then
156 after no red results in some period, they go "hey, yep, we're good" and
157 ship a stable release.
158
159 Or maybe occasional pockets of "you dun goofed" where there will be a
160 problem you might have to look into ( sometimes those problems are
161 entirely invalid problems, ... this is somehow typically not an issue )
162
163 http://matrix.cpantesters.org/?dist=App-perlbrew+0.76
164
165 And you throw variants analysis into the mix and you get those other
166 facts compared and ranked by "Likelihood to be part of the problem"
167
168 http://analysis.cpantesters.org/solved?distv=App-perlbrew-0.76
169
170 ^ you see here variant analysis found 3 common strings in the logs that
171 indicated a failure, and it pointed the finger directly at the failing
172 test as a result. And then in rank #3, you see its pointing a finger at
173 CPAN::Perl::Releases as "a possible problem highly correlated with
174 failures" with the -0.5 theta on version 2.88
175
176 Lo and behold, automated differential analysis has found the bug:
177
178 https://rt.cpan.org/Ticket/Display.html?id=116517
179
180 It still takes a human to
181
182 a) decide to look
183 b) decide the differential factors are useful enough to pursue
184 c) verify the problem manually by using the guidance given
185 d) manually file the bug
186
187 But the point here is we can actually build some infrastructure that
188 will give automated tooling some degree of assurance that "this can
189 probably be safely stabilized now, the testers aren't seeing any issues"
190
191 Its just also the sort of data collection that can lend itself to much
192 more powerful benefits as well.
193
194 The only hard parts are:
195
196 1. Making a good server to handle these reports that scales well
197 2. Making a good client for report generation, collection from PORTAGE
198 and submission
199 3. Getting people to turn on the feature
200 4. Getting enough people using the feature that the majority of the
201 "easy" stabilizations can happen hands-free.
202
203 And we don't even have to do the "Fancy" parts of it now:
204
205 Just pools of "package: arch = 100pass/0fail archb = 10pass/0 fail"
206
207 Would be a great start.
208
209 Because otherwise we're relying 100% on negative feedback, and assuming
210 that the absence of negative feedback is positive, when the reality
211 might be closer that the absence of negative feedback is that the
212 problems were too confusing to report as an explicit bug, the problems
213 faced were deemed unimportant to the person in question and they gave
214 up before they reported it, the user encountered some other entry
215 barrier in reporting, ..... or maybe, nobody is actually using the
216 package at all, so it could actually be completely broken and nobody
217 notices.
218
219 And it seems entirely hap-hazard to encourage tooling that not
220 *builds* upon that assumption.
221
222 At least with the manual stabilization process, you can be assured that
223 at least one human will personally install, test, and verify a package
224 works in at least one situation.
225
226 With a completely automated stabilization that relies on the absence of
227 negative feedback to stabilize, you're *not even getting that*.
228
229 Why bother with stabilization at all if the entire thing is merely
230 *conjecture* ?
231
232 Even a broken, flawed stabilization workflow done by teams of people
233 who are bad at testing is better than a stabilization workflow
234 implemented on conjecture of stability :P

Replies