Gentoo Archives: gentoo-qa

From: Stanislav Ochotnicky <sochotnicky@×××××.com>
To: gentoo-soc@l.g.o
Cc: gentoo-qa@l.g.o
Subject: [gentoo-qa] Re: [gentoo-soc] [GSoC-status] Tree-wide collision checking and files database
Date: Fri, 12 Jun 2009 19:14:23
Message-Id: 20090612191321.GE8898@w0rm
1 On 10:05 Fri 12 Jun , Jeremy Olexa wrote:
2 > On Fri, Jun 12, 2009 at 8:32 AM, Stanislav
3 > Ochotnicky<sochotnicky@×××××.com> wrote:
4 > > Hi everyone,
5 > >
6 > > some of you already know that work on GSoC project "Tree-wide collision
7 > > checking and provided files database" has been started a few weeks ago.
8 > > For the rest, I will make a short introduction and goals of this
9 > > project (collagen).
10 > >
11 > > Collagen aims to improve quality of ebuilds in portage tree. It does
12 > > this by compiling as many ebuilds as possible. It specifically takes
13 > > into account various atoms in DEPEND variable. For example if package
14 > > ebuild states that it needs =dev-libs/glib-2*, that package should be
15 > > compilable with every version of glib-2* in portage (taking into account
16 > > keywords). Therefore collagen will install one version of glib-2*, then
17 > > ebuild in question, collect information, uninstall ebuild and first
18 > > glib version. If repeats this process for every glib-2* in the tree.
19 >
20 > Testing against every version of the deps as required seems like it is
21 > diverging from the original "Tree-wide collision checking and provided
22 > files database" - Would you say that the goal of this project is
23 > becoming more QA orientated? Something like: "Matchbox: A tinderboxen
24 > master server to provide QA for ebuilds"
25
26 Yes that's true, this project is moving towards QA (it was proposed by
27 QA developer after all). SoC list was only CCed, To: was to gentoo-qa. I
28 should have warn in my original email so that responses wouldn't get
29 lost between two lists. Added QA to cc. I believe crossposting is OK in
30 this case. I apologize if that's not the case.
31
32 This is from one of my discussions with weaver (my mentor):
33
34 <quote mode=summary>
35 say package X depends on libfoo, and there is libfoo-1.2.3 and libfoo-1.3.4
36 in portage, as well as libfoo-1.0.1 in the attic
37 (http://sources.gentoo.org/viewcvs.py/gentoo-x86/ - view dead files)
38 but the dev was a bit sloppy and didn't check with libfoo-1.2.3 and that
39 version has a file collision with X, while libfoo-1.3.4 doesn't and X fails
40 to compile with libfoo-1.0.1, even though some people might still have that
41 on their systems so you have some build failures/collisions here that are
42 QA problems which should be caught and can only be caught by iterating
43 through all versions of the dependencies.
44 </quote>
45
46
47
48 > If you were strictly collision checking, then you don't care about
49 > every version of glib-2* you only care about the package in question
50 > and what installed files it provides. However for the provided files,
51 > you do care about every version of glib-2*, not for the other package,
52 > but to list the installed files of glib-2*
53
54 Yes that's true. Strictly speaking collisions could be caught easily by
55 compiling every package (usually with as many USE flags enabled as
56 possible). However It would be nice to catch ebuilds that don't specify correct
57 versions in DEPEND. Now that I think about it, it may be a good idea to
58 allow matchbox to specify if tinderbox should try to compile against
59 every version of dependencies or just one. By default we would only
60 check against latest version, but specific packages could be set in a
61 way to check every version of dependencies.
62
63
64 > After writing that down, I can see why you want to compile, check,
65 > uninstall, re-compile, repeat...but I worry about how efficient it is
66 > and what ways to improve that.
67
68 That's my concern too. That's why I wanted to use central binary host.
69 Every compiled package could then be reused across all tinderboxes (with
70 same architecture of course).
71 I was counting on them being on high-speed network connection (ideally
72 LAN).
73
74 There is one more good thing about always starting with nothing but
75 bare system in the beginning. If something is missing in DEPEND we will catch
76 it easily.
77
78 > >
79 > > Original idea was to have two sides:
80 > >  * master server (matchbox)
81 > >  * slaves compiling packages (tinderboxes)
82 > >
83 > > Master server decides what needs to be compiled (automatically or
84 > > semi-automatically). Tinderbox asks for job, master provides package
85 > > name (and optionally version). Tinderbox then goes and tries to compile
86 > > package with different sets of dependencies reporting results to
87 > > Matchbox.
88 > >
89 > > It seems that whole process could be sped up by hosting binary
90 > > packages on one central server (Binary host). Obviously various versions
91 > > of the same package would be created and therefore unique names could be
92 > > created by using some metadata to create hash part of filename. On a
93 > > first thought I would use USE flags and DEPEND as metadata to hash.
94 >
95 > This is a cool aspect of the project, I hope you can work with solar
96 > and zmedico to improve binpkgs. USE flags seem to be the trouble spot
97 > of binpkgs.
98
99 That's my other concern. I know that there was GSoC project to improve
100 binary support in portage. Merging two project into one would not
101 achieve much IMO. However I know for certain that to a certain degree I
102 could make further work easier and I will do my best to do so. So make
103 things as simple as possible, but not simpler :-)
104
105 > >
106 > > So far two other projects came to light as possible source of
107 > > inspiration and/or collaboration:
108 > >  * catalyst (mainly tinderbox generating part)
109 > >  * AutotuA (automatic generic job framework)
110 > >
111 > > Especially AutotuA seems like good candidate for merging.
112 > >
113 > > It doesn't seem possible to compile every project with every version of
114 > > every dependency, therefore I'd like to ask for your opinion especially
115 > > about this part. One idea I had was to restrict testing to highest build
116 > > number for given version. For example we have:
117 > > glib-2.18.4-r1 and glib-2.18.4-r2, therefore we will only test against
118 > > glib-2.18.4-r2 and will assume that r1 would be OK too (or users would
119 > > upgrade since it's a bugfix release)
120 >
121 > IMO, you have two choices. Latest stable or latest ~arch. Stable users
122 > will not upgrade from glib-2.18.4-r1 to -r2 until -r2 is stable so
123 > that argument is out.
124
125 I should have checked ebuilds before posting I guess :-) That was meant
126 as an example. For the sake of argument consider they are both arch (not
127 ~arch).
128
129 > >
130 > > Another approach to optimizing use of resources would be to have a
131 > > priority list of packages that need most testing. I imagine this could
132 > > be created by analyzing logs from gentoo mirrors, and figuring out which
133 > > packages are downloaded most frequently.
134 >
135 > Mirror log analysis is a fundamentally hard thing to do given the vast
136 > network of mirrors that we have.
137
138 Of course. It's not meant to be precise, just approximation to what
139 packages are favourites. Even then I realize there will always be
140 regional tendencies. Oh well...masses :-)
141
142
143 > >
144 > > We would probably need at least one tinderbox per glibc version if I am
145 > > not mistaken since this cannot be freely up/downgraded.
146 >
147 > Its free to upgrade ;) Can't downgrade. Given how large the glibc
148 > tracker bugs get, I don't think this project should use the latest
149 > glibc available. Unless you are trying to hunt down bugs, but I think
150 > you will get buried with compile failures. If the goal of this project
151 > is to data mine the installed package's information, that is not
152 > dependant on a glibc version. Please think about this some more before
153 > going down that road, I want this project to be successful ;)
154
155
156 Right. That was a suggestion for QA to think about. I would like to
157 think about this as my opportunity to give something back to Gentoo
158 after all these years as a user (taking, not giving much back). So it
159 all boils down to what would Gentoo (QA team) need to improve Gentoo
160 further. If it is enough to test latest stable glibc then that's how
161 it's gonna be done. Maybe again make it possible to change behaviour in
162 the future.
163
164 --
165 Stanislav Ochotnicky
166 Working for Gentoo Linux http://www.gentoo.org
167 Implementing Tree-wide collision checking and provided files database
168 http://soc.gentooexperimental.org/projects/show/collision-database
169 Blog: http://inputvalidation.blogspot.com/search/label/gsoc
170
171
172 jabber: sochotnicky@×××××.com
173 icq: 74274152
174 PGP: https://dl.getdropbox.com/u/165616/sochotnicky-key.asc