1 |
On Wed, May 11, 2011 at 6:12 AM, Jack Morgan <jack@×××××××.com> wrote: |
2 |
> |
3 |
> |
4 |
> On 05/10/2011 01:13 PM, Jorge Manuel B. S. Vicetto wrote: |
5 |
>> Hi. |
6 |
>> |
7 |
>> Another issue that was raised in the discussion with the arch teams, |
8 |
>> even though it predates the arch teams resources thread as we've talked |
9 |
>> about it on FOSDEM 2011 and even before, is getting more automatic |
10 |
>> testing done on Gentoo. |
11 |
>> |
12 |
>> I'm bcc'ing a few teams on this thread as it involves them and hopefully |
13 |
>> might interest them as well. |
14 |
>> |
15 |
>> Both Release Engineering and QA teams would like to have more automatic |
16 |
>> testing to find breakages and to help track "when" things break and more |
17 |
>> importantly *why* they break. |
18 |
>> |
19 |
>> To avoid misunderstandings, we already have testing and even automated |
20 |
>> testing being done on Gentoo. The "first line" of testing is done by |
21 |
>> developers using repoman and or the PM's QA tools. We also have |
22 |
>> individual developers and the QA team hopefully checking commits and |
23 |
>> everyone testing their packages. |
24 |
>> |
25 |
>> Furtermore, the current weekly automatic stage building has helped |
26 |
>> identify some issues with the tree. The tinderbox work done by Patrick |
27 |
>> and Diego, as well as others, has also helped finding broken packages |
28 |
>> and or identifying packages affected by major changes before they hit |
29 |
>> the tree. The use of repoman, pcheck and or paludis quality assurance |
30 |
>> tools in the past and present to generate reports about tree issues, |
31 |
>> like Michael's (mr_bones) emails have also helped identifying and |
32 |
>> addressing issues. |
33 |
>> |
34 |
>> Recently, we've got a new site to check the results of some tests |
35 |
>> http://qa-reports.gentoo.org/ with the possibility to add more scripts |
36 |
>> to provide / run even more tests. |
37 |
>> |
38 |
>> So, why "more testing"? For starters, more *automatic* testing. Then |
39 |
>> more testing as reports from testing can help greatly in identifying |
40 |
>> when things break and why they break. As someone that looks over the |
41 |
>> automatic stage building for amd64 and x86, and that has to talk to |
42 |
>> teams / developers when things break, having more, more in depth and |
43 |
>> regular automatic testing would help my (releng) job. The work for the |
44 |
>> live-dvd would also be easier if the builds were "automated" and the job |
45 |
>> wasn't "restarted" every N months. Furthermore, creating a framework for |
46 |
>> developers to be able to schedule testing for proposed changes, in |
47 |
>> particular for substantial changes, might (should?) help improve the |
48 |
>> user's experience. |
49 |
>> |
50 |
>> I hope you agree with "more testing" by now, but what testing? It's good |
51 |
>> to test something, but "what" do we want to test and "how" do we want to |
52 |
>> test? |
53 |
>> |
54 |
>> |
55 |
>> I think we should try to have at least the following categories of tests: |
56 |
>> |
57 |
>> * Portage (overlays?) QA tests |
58 |
>> tests with the existing QA tools to check the consistency of |
59 |
>> dependencies and the quality of ebuilds / eclasses. |
60 |
|
61 |
These are almost separate. I assume your intent was 'lets automate |
62 |
pcheck & co. runs of gentoo-x86 and if we get that working we can add |
63 |
overlays from layman' which sounds fine to me ;) |
64 |
|
65 |
>> |
66 |
>> * (on demand?) package (stable / unstable) revdep rebuild (tinderbox) |
67 |
>> framework to schedule testing of proposed changes and check their impact |
68 |
|
69 |
I'd be curious what the load is here. We are adopting an on-demand |
70 |
testing infrastructure at work. Right now we have a continuous build |
71 |
but it is time-delta based and not event-based so it groups changes |
72 |
together which makes it hard to find what broke things. At work we |
73 |
only submit a few changes a day though, so we need a very small |
74 |
infrastructure to test each change. Gentoo has way more commits (at |
75 |
least one every few minutes on average, and then there are huge |
76 |
commits like KDE stablization...) |
77 |
|
78 |
What I'd recommend here is essentially some kind of control field in |
79 |
the commit itself (commitmsg?) that controls exactly what tests are |
80 |
done for that commit. |
81 |
|
82 |
>> |
83 |
>> * Weekly (?) stable / unstable stage / ISO arch builds |
84 |
>> the automatic stage building, including new specs for the testing tree |
85 |
>> as we currently only test the stable tree. |
86 |
|
87 |
I'm curious if you constantly build unstable..do you plan on fixing |
88 |
it? My understanding of Gentoo is that in ~arch something is always |
89 |
slightly broken and thats OK. I worry that ~arch builds may just end |
90 |
up being noise because they don't build properly due to the high |
91 |
velocity of changes. |
92 |
|
93 |
>> |
94 |
>> * (schedule?) specific tailored stage4 builds |
95 |
>> testing of specific tailored "real world" images (web server, intranet |
96 |
>> server, generic desktop, GNOME desktop, KDE desktop, etc). |
97 |
|
98 |
Again it would be interesting to have some kind of control field in my |
99 |
commits so when KDE is stable I could trigger a build of the 'KDE |
100 |
stage4' or whatnot. |
101 |
|
102 |
If we ever finish this gentoo-stats project it would be interesting to |
103 |
see what users are actually using as well. Do users use the defaults? |
104 |
Are the stage4's we are testing actually relevant? |
105 |
|
106 |
>> |
107 |
>> * Bi-Weekly (?) stable / unstable AMD64/X86 LiveDVD builds |
108 |
>> automatic creation of the live-DVD to test a very broad set of packages |
109 |
>> |
110 |
>> * automated testing of built stage / CD / LiveDVD (KVM guest?) (CLI / |
111 |
>> GUI / log parsing ?) |
112 |
>> framework to test the built stages / install media and ensure it works |
113 |
>> as expected |
114 |
|
115 |
I think testing that the liveDVD we just built boots is a decent test |
116 |
(and probably not to difficult to write.) Testing that 'everything on |
117 |
the DVD works' is likely more of a challenge and I'm not sure it buys |
118 |
us anything. Do we often find that we release LiveDVDs with broken |
119 |
software? |
120 |
|
121 |
>> |
122 |
>> |
123 |
>> I don't have a framework for conducting some of these tests, including |
124 |
>> the stage/iso validation, but some of them can use the existing tools |
125 |
>> like the stage building and the tree QA tests. |
126 |
>> |
127 |
>> Do you have any suggestions about the automatic testing? Do you know of |
128 |
>> other tests or tools that we can and should use to improve QA on Gentoo? |
129 |
> |
130 |
> You might take a look at autotest from kernel.org. It's a Python based |
131 |
> framework for automating testing. It's specific towards kernel testing, |
132 |
> but could be modified for your needs. |
133 |
|
134 |
Autotest would likely require a branch and a fair bit of work to be |
135 |
used for OS qualification. We use it for OS qualification at work |
136 |
(Goobuntu@Google) |
137 |
|
138 |
While I hesitate to say 'roll your own' if you can get something |
139 |
working in 1-2 months I can certainly see it being easier to maintain |
140 |
than autotest...there really is not a killer feature that autotest |
141 |
has. The reporting / graphing is pretty bad, it uses ssh for |
142 |
everything and basically keeps long-running connections open (might be |
143 |
fine if you are using kvm..but not over the WAN), the API is terrible |
144 |
and requires all kinds of horrible-ness to use...I could go on ; |
145 |
|
146 |
> |
147 |
> |
148 |
> |
149 |
> |
150 |
> -- |
151 |
> Jack Morgan |
152 |
> Pub 4096R/761D8E0A 2010-09-13 Jack Morgan <jack@×××××××.com> |
153 |
> Fingerprint = DD42 EA48 D701 D520 C2CD 55BE BF53 C69B 761D 8E0A |
154 |
> |
155 |
> |