Re: [gentoo-dev] Automatic testing on Gentoo - gentoo-dev

From:	Alec Warner <antarus@g.o>
To:	gentoo-dev@l.g.o
Subject:	Re: [gentoo-dev] Automatic testing on Gentoo
Date:	Wed, 11 May 2011 17:40:03
Message-Id:	`BANLkTim6Gh5pHe3Zg0y4OvfuGAUmkhZ8Ww@mail.gmail.com`
In Reply to:	Re: [gentoo-dev] Automatic testing on Gentoo by Jack Morgan

1

On Wed, May 11, 2011 at 6:12 AM, Jack Morgan <jack@×××××××.com> wrote:

2

>

3

>

4

> On 05/10/2011 01:13 PM, Jorge Manuel B. S. Vicetto wrote:

5

>> Hi.

6

>>

7

>> Another issue that was raised in the discussion with the arch teams,

8

>> even though it predates the arch teams resources thread as we've talked

9

>> about it on FOSDEM 2011 and even before, is getting more automatic

10

>> testing done on Gentoo.

11

>>

12

>> I'm bcc'ing a few teams on this thread as it involves them and hopefully

13

>> might interest them as well.

14

>>

15

>> Both Release Engineering and QA teams would like to have more automatic

16

>> testing to find breakages and to help track "when" things break and more

17

>> importantly *why* they break.

18

>>

19

>> To avoid misunderstandings, we already have testing and even automated

20

>> testing being done on Gentoo. The "first line" of testing is done by

21

>> developers using repoman and or the PM's QA tools. We also have

22

>> individual developers and the QA team hopefully checking commits and

23

>> everyone testing their packages.

24

>>

25

>> Furtermore, the current weekly automatic stage building has helped

26

>> identify some issues with the tree. The tinderbox work done by Patrick

27

>> and Diego, as well as others, has also helped finding broken packages

28

>> and or identifying packages affected by major changes before they hit

29

>> the tree. The use of repoman, pcheck and or paludis quality assurance

30

>> tools in the past and present to generate reports about tree issues,

31

>> like Michael's (mr_bones) emails have also helped identifying and

32

>> addressing issues.

33

>>

34

>> Recently, we've got a new site to check the results of some tests

35

>> http://qa-reports.gentoo.org/ with the possibility to add more scripts

36

>> to provide / run even more tests.

37

>>

38

>> So, why "more testing"? For starters, more *automatic* testing. Then

39

>> more testing as reports from testing can help greatly in identifying

40

>> when things break and why they break. As someone that looks over the

41

>> automatic stage building for amd64 and x86, and that has to talk to

42

>> teams / developers when things break, having more, more in depth and

43

>> regular automatic testing would help my (releng) job. The work for the

44

>> live-dvd would also be easier if the builds were "automated" and the job

45

>> wasn't "restarted" every N months. Furthermore, creating a framework for

46

>> developers to be able to schedule testing for proposed changes, in

47

>> particular for substantial changes, might (should?) help improve the

48

>> user's experience.

49

>>

50

>> I hope you agree with "more testing" by now, but what testing? It's good

51

>> to test something, but "what" do we want to test and "how" do we want to

52

>> test?

53

>>

54

>>

55

>> I think we should try to have at least the following categories of tests:

56

>>

57

>>  * Portage (overlays?) QA tests

58

>>       tests with the existing QA tools to check the consistency of

59

>> dependencies and the quality of ebuilds / eclasses.

60

61

These are almost separate. I assume your intent was 'lets automate

62

pcheck & co. runs of gentoo-x86 and if we get that working we can add

63

overlays from layman' which sounds fine to me ;)

64

65

>>

66

>>  * (on demand?) package (stable / unstable) revdep rebuild (tinderbox)

67

>>       framework to schedule testing of proposed changes and check their impact

68

69

I'd be curious what the load is here. We are adopting an on-demand

70

testing infrastructure at work.  Right now we have a continuous build

71

but it is time-delta based and not event-based so it groups changes

72

together which makes it hard to find what broke things. At work we

73

only submit a few changes a day though, so we need a very small

74

infrastructure to test each change. Gentoo has way more commits (at

75

least one every few minutes on average, and then there are huge

76

commits like KDE stablization...)

77

78

What I'd recommend here is essentially some kind of control field in

79

the commit itself (commitmsg?) that controls exactly what tests are

80

done for that commit.

81

82

>>

83

>>  * Weekly (?) stable / unstable stage / ISO arch builds

84

>>       the automatic stage building, including new specs for the testing tree

85

>> as we currently only test the stable tree.

86

87

I'm curious if you constantly build unstable..do you plan on fixing

88

it? My understanding of Gentoo is that in ~arch something is always

89

slightly broken and thats OK. I worry that ~arch builds may just end

90

up being noise because they don't build properly due to the high

91

velocity of changes.

92

93

>>

94

>>  * (schedule?) specific tailored stage4 builds

95

>>       testing of specific tailored "real world" images (web server, intranet

96

>> server, generic desktop, GNOME desktop, KDE desktop, etc).

97

98

Again it would be interesting to have some kind of control field in my

99

commits so when KDE is stable I could trigger a build of the 'KDE

100

stage4' or whatnot.

101

102

If we ever finish this gentoo-stats project it would be interesting to

103

see what users are actually using as well. Do users use the defaults?

104

Are the stage4's we are testing actually relevant?

105

106

>>

107

>>  * Bi-Weekly (?) stable / unstable AMD64/X86 LiveDVD builds

108

>>       automatic creation of the live-DVD to test a very broad set of packages

109

>>

110

>>  * automated testing of built stage / CD / LiveDVD (KVM guest?) (CLI /

111

>> GUI / log parsing ?)

112

>>       framework to test the built stages / install media and ensure it works

113

>> as expected

114

115

I think testing that the liveDVD we just built boots is a decent test

116

(and probably not to difficult to write.) Testing that 'everything on

117

the DVD works' is likely more of a challenge and I'm not sure it buys

118

us anything. Do we often find that we release LiveDVDs with broken

119

software?

120

121

>>

122

>>

123

>> I don't have a framework for conducting some of these tests, including

124

>> the stage/iso validation, but some of them can use the existing tools

125

>> like the stage building and the tree QA tests.

126

>>

127

>> Do you have any suggestions about the automatic testing? Do you know of

128

>> other tests or tools that we can and should use to improve QA on Gentoo?

129

>

130

> You might take a look at autotest from kernel.org. It's a Python based

131

> framework for automating testing. It's specific towards kernel testing,

132

> but could be modified for your needs.

133

134

Autotest would likely require a branch and a fair bit of work to be

135

used for OS qualification. We use it for OS qualification at work

136

(Goobuntu@Google)

137

138

While I hesitate to say 'roll your own' if you can get something

139

working in 1-2 months I can certainly see it being easier to maintain

140

than autotest...there really is not a killer feature that autotest

141

has. The reporting / graphing is pretty bad, it uses ssh for

142

everything and basically keeps long-running connections open (might be

143

fine if you are using kvm..but not over the WAN), the API is terrible

144

and requires all kinds of horrible-ness to use...I could go on ;

145

146

>

147

>

148

>

149

>

150

> --

151

> Jack Morgan

152

> Pub 4096R/761D8E0A 2010-09-13 Jack Morgan <jack@×××××××.com>

153

> Fingerprint = DD42 EA48 D701 D520 C2CD 55BE BF53 C69B 761D 8E0A

154

>

155

>

1	On Wed, May 11, 2011 at 6:12 AM, Jack Morgan <jack@×××××××.com> wrote:
2	>
3	>
4	> On 05/10/2011 01:13 PM, Jorge Manuel B. S. Vicetto wrote:
5	>> Hi.
6	>>
7	>> Another issue that was raised in the discussion with the arch teams,
8	>> even though it predates the arch teams resources thread as we've talked
9	>> about it on FOSDEM 2011 and even before, is getting more automatic
10	>> testing done on Gentoo.
11	>>
12	>> I'm bcc'ing a few teams on this thread as it involves them and hopefully
13	>> might interest them as well.
14	>>
15	>> Both Release Engineering and QA teams would like to have more automatic
16	>> testing to find breakages and to help track "when" things break and more
17	>> importantly why they break.
18	>>
19	>> To avoid misunderstandings, we already have testing and even automated
20	>> testing being done on Gentoo. The "first line" of testing is done by
21	>> developers using repoman and or the PM's QA tools. We also have
22	>> individual developers and the QA team hopefully checking commits and
23	>> everyone testing their packages.
24	>>
25	>> Furtermore, the current weekly automatic stage building has helped
26	>> identify some issues with the tree. The tinderbox work done by Patrick
27	>> and Diego, as well as others, has also helped finding broken packages
28	>> and or identifying packages affected by major changes before they hit
29	>> the tree. The use of repoman, pcheck and or paludis quality assurance
30	>> tools in the past and present to generate reports about tree issues,
31	>> like Michael's (mr_bones) emails have also helped identifying and
32	>> addressing issues.
33	>>
34	>> Recently, we've got a new site to check the results of some tests
35	>> http://qa-reports.gentoo.org/ with the possibility to add more scripts
36	>> to provide / run even more tests.
37	>>
38	>> So, why "more testing"? For starters, more automatic testing. Then
39	>> more testing as reports from testing can help greatly in identifying
40	>> when things break and why they break. As someone that looks over the
41	>> automatic stage building for amd64 and x86, and that has to talk to
42	>> teams / developers when things break, having more, more in depth and
43	>> regular automatic testing would help my (releng) job. The work for the
44	>> live-dvd would also be easier if the builds were "automated" and the job
45	>> wasn't "restarted" every N months. Furthermore, creating a framework for
46	>> developers to be able to schedule testing for proposed changes, in
47	>> particular for substantial changes, might (should?) help improve the
48	>> user's experience.
49	>>
50	>> I hope you agree with "more testing" by now, but what testing? It's good
51	>> to test something, but "what" do we want to test and "how" do we want to
52	>> test?
53	>>
54	>>
55	>> I think we should try to have at least the following categories of tests:
56	>>
57	>> * Portage (overlays?) QA tests
58	>> tests with the existing QA tools to check the consistency of
59	>> dependencies and the quality of ebuilds / eclasses.
60
61	These are almost separate. I assume your intent was 'lets automate
62	pcheck & co. runs of gentoo-x86 and if we get that working we can add
63	overlays from layman' which sounds fine to me ;)
64
65	>>
66	>> * (on demand?) package (stable / unstable) revdep rebuild (tinderbox)
67	>> framework to schedule testing of proposed changes and check their impact
68
69	I'd be curious what the load is here. We are adopting an on-demand
70	testing infrastructure at work. Right now we have a continuous build
71	but it is time-delta based and not event-based so it groups changes
72	together which makes it hard to find what broke things. At work we
73	only submit a few changes a day though, so we need a very small
74	infrastructure to test each change. Gentoo has way more commits (at
75	least one every few minutes on average, and then there are huge
76	commits like KDE stablization...)
77
78	What I'd recommend here is essentially some kind of control field in
79	the commit itself (commitmsg?) that controls exactly what tests are
80	done for that commit.
81
82	>>
83	>> * Weekly (?) stable / unstable stage / ISO arch builds
84	>> the automatic stage building, including new specs for the testing tree
85	>> as we currently only test the stable tree.
86
87	I'm curious if you constantly build unstable..do you plan on fixing
88	it? My understanding of Gentoo is that in ~arch something is always
89	slightly broken and thats OK. I worry that ~arch builds may just end
90	up being noise because they don't build properly due to the high
91	velocity of changes.
92
93	>>
94	>> * (schedule?) specific tailored stage4 builds
95	>> testing of specific tailored "real world" images (web server, intranet
96	>> server, generic desktop, GNOME desktop, KDE desktop, etc).
97
98	Again it would be interesting to have some kind of control field in my
99	commits so when KDE is stable I could trigger a build of the 'KDE
100	stage4' or whatnot.
101
102	If we ever finish this gentoo-stats project it would be interesting to
103	see what users are actually using as well. Do users use the defaults?
104	Are the stage4's we are testing actually relevant?
105
106	>>
107	>> * Bi-Weekly (?) stable / unstable AMD64/X86 LiveDVD builds
108	>> automatic creation of the live-DVD to test a very broad set of packages
109	>>
110	>> * automated testing of built stage / CD / LiveDVD (KVM guest?) (CLI /
111	>> GUI / log parsing ?)
112	>> framework to test the built stages / install media and ensure it works
113	>> as expected
114
115	I think testing that the liveDVD we just built boots is a decent test
116	(and probably not to difficult to write.) Testing that 'everything on
117	the DVD works' is likely more of a challenge and I'm not sure it buys
118	us anything. Do we often find that we release LiveDVDs with broken
119	software?
120
121	>>
122	>>
123	>> I don't have a framework for conducting some of these tests, including
124	>> the stage/iso validation, but some of them can use the existing tools
125	>> like the stage building and the tree QA tests.
126	>>
127	>> Do you have any suggestions about the automatic testing? Do you know of
128	>> other tests or tools that we can and should use to improve QA on Gentoo?
129	>
130	> You might take a look at autotest from kernel.org. It's a Python based
131	> framework for automating testing. It's specific towards kernel testing,
132	> but could be modified for your needs.
133
134	Autotest would likely require a branch and a fair bit of work to be
135	used for OS qualification. We use it for OS qualification at work
136	(Goobuntu@Google)
137
138	While I hesitate to say 'roll your own' if you can get something
139	working in 1-2 months I can certainly see it being easier to maintain
140	than autotest...there really is not a killer feature that autotest
141	has. The reporting / graphing is pretty bad, it uses ssh for
142	everything and basically keeps long-running connections open (might be
143	fine if you are using kvm..but not over the WAN), the API is terrible
144	and requires all kinds of horrible-ness to use...I could go on ;
145
146	>
147	>
148	>
149	>
150	> --
151	> Jack Morgan
152	> Pub 4096R/761D8E0A 2010-09-13 Jack Morgan <jack@×××××××.com>
153	> Fingerprint = DD42 EA48 D701 D520 C2CD 55BE BF53 C69B 761D 8E0A
154	>
155	>

Gentoo Archives: gentoo-dev