Gentoo Archives: gentoo-soc

From: Jeremy Olexa <darkside@g.o>
To: gentoo-soc@l.g.o
Subject: Re: [gentoo-soc] [GSoC-status] Tree-wide collision checking and files database
Date: Fri, 12 Jun 2009 15:05:44
Message-Id: 90b936c0906120805m18cc4f55j3b33d0d17d855970@mail.gmail.com
In Reply to: [gentoo-soc] [GSoC-status] Tree-wide collision checking and files database by Stanislav Ochotnicky
1 On Fri, Jun 12, 2009 at 8:32 AM, Stanislav
2 Ochotnicky<sochotnicky@×××××.com> wrote:
3 > Hi everyone,
4 >
5 > some of you already know that work on GSoC project "Tree-wide collision
6 > checking and provided files database" has been started a few weeks ago.
7 > For the rest, I will make a short introduction and goals of this
8 > project (collagen).
9 >
10 > Collagen aims to improve quality of ebuilds in portage tree. It does
11 > this by compiling as many ebuilds as possible. It specifically takes
12 > into account various atoms in DEPEND variable. For example if package
13 > ebuild states that it needs =dev-libs/glib-2*, that package should be
14 > compilable with every version of glib-2* in portage (taking into account
15 > keywords). Therefore collagen will install one version of glib-2*, then
16 > ebuild in question, collect information, uninstall ebuild and first
17 > glib version. If repeats this process for every glib-2* in the tree.
18
19 Testing against every version of the deps as required seems like it is
20 diverging from the original "Tree-wide collision checking and provided
21 files database" - Would you say that the goal of this project is
22 becoming more QA orientated? Something like: "Matchbox: A tinderboxen
23 master server to provide QA for ebuilds"
24
25 If you were strictly collision checking, then you don't care about
26 every version of glib-2* you only care about the package in question
27 and what installed files it provides. However for the provided files,
28 you do care about every version of glib-2*, not for the other package,
29 but to list the installed files of glib-2*
30
31 After writing that down, I can see why you want to compile, check,
32 uninstall, re-compile, repeat...but I worry about how efficient it is
33 and what ways to improve that.
34
35 >
36 > Original idea was to have two sides:
37 >  * master server (matchbox)
38 >  * slaves compiling packages (tinderboxes)
39 >
40 > Master server decides what needs to be compiled (automatically or
41 > semi-automatically). Tinderbox asks for job, master provides package
42 > name (and optionally version). Tinderbox then goes and tries to compile
43 > package with different sets of dependencies reporting results to
44 > Matchbox.
45 >
46 > It seems that whole process could be sped up by hosting binary
47 > packages on one central server (Binary host). Obviously various versions
48 > of the same package would be created and therefore unique names could be
49 > created by using some metadata to create hash part of filename. On a
50 > first thought I would use USE flags and DEPEND as metadata to hash.
51
52 This is a cool aspect of the project, I hope you can work with solar
53 and zmedico to improve binpkgs. USE flags seem to be the trouble spot
54 of binpkgs.
55
56 >
57 > So far two other projects came to light as possible source of
58 > inspiration and/or collaboration:
59 >  * catalyst (mainly tinderbox generating part)
60 >  * AutotuA (automatic generic job framework)
61 >
62 > Especially AutotuA seems like good candidate for merging.
63 >
64 > It doesn't seem possible to compile every project with every version of
65 > every dependency, therefore I'd like to ask for your opinion especially
66 > about this part. One idea I had was to restrict testing to highest build
67 > number for given version. For example we have:
68 > glib-2.18.4-r1 and glib-2.18.4-r2, therefore we will only test against
69 > glib-2.18.4-r2 and will assume that r1 would be OK too (or users would
70 > upgrade since it's a bugfix release)
71
72 IMO, you have two choices. Latest stable or latest ~arch. Stable users
73 will not upgrade from glib-2.18.4-r1 to -r2 until -r2 is stable so
74 that argument is out.
75
76 >
77 > Another approach to optimizing use of resources would be to have a
78 > priority list of packages that need most testing. I imagine this could
79 > be created by analyzing logs from gentoo mirrors, and figuring out which
80 > packages are downloaded most frequently.
81
82 Mirror log analysis is a fundamentally hard thing to do given the vast
83 network of mirrors that we have.
84
85 >
86 > We would probably need at least one tinderbox per glibc version if I am
87 > not mistaken since this cannot be freely up/downgraded.
88
89 Its free to upgrade ;) Can't downgrade. Given how large the glibc
90 tracker bugs get, I don't think this project should use the latest
91 glibc available. Unless you are trying to hunt down bugs, but I think
92 you will get buried with compile failures. If the goal of this project
93 is to data mine the installed package's information, that is not
94 dependant on a glibc version. Please think about this some more before
95 going down that road, I want this project to be successful ;)
96
97 -Jeremy
98
99
100 > This email was meant just as a teaser, more information (data model, UML
101 > diagrams) is available on project website (look for Documents):
102 > http://soc.gentooexperimental.org/projects/show/collision-database
103 >
104 > I'd love to be hear some suggestions, opinions and criticism. You can
105 > use this thread, or even various options on gentooexperimental.org.
106 >
107 > --
108 > Stanislav Ochotnicky
109 > Working for Gentoo Linux http://www.gentoo.org
110 > Implementing Tree-wide collision checking and provided files database
111 > http://soc.gentooexperimental.org/projects/show/collision-database
112 > Blog: http://inputvalidation.blogspot.com/search/label/gsoc
113 >
114 >
115 > jabber: sochotnicky@×××××.com
116 > icq: 74274152
117 > PGP: https://dl.getdropbox.com/u/165616/sochotnicky-key.asc
118 >

Replies

Subject Author
Re: [gentoo-soc] [GSoC-status] Tree-wide collision checking and files database Eitan Mosenkis <eitan@××××××××.net>
Re: [gentoo-soc] [GSoC-status] Tree-wide collision checking and files database Stanislav Ochotnicky <sochotnicky@×××××.com>
Re: [gentoo-soc] [GSoC-status] Tree-wide collision checking and files database Arne Babenhauserheide <arne_bab@×××.de>