1 |
On 10:05 Fri 12 Jun , Jeremy Olexa wrote: |
2 |
> On Fri, Jun 12, 2009 at 8:32 AM, Stanislav |
3 |
> Ochotnicky<sochotnicky@×××××.com> wrote: |
4 |
> > Hi everyone, |
5 |
> > |
6 |
> > some of you already know that work on GSoC project "Tree-wide collision |
7 |
> > checking and provided files database" has been started a few weeks ago. |
8 |
> > For the rest, I will make a short introduction and goals of this |
9 |
> > project (collagen). |
10 |
> > |
11 |
> > Collagen aims to improve quality of ebuilds in portage tree. It does |
12 |
> > this by compiling as many ebuilds as possible. It specifically takes |
13 |
> > into account various atoms in DEPEND variable. For example if package |
14 |
> > ebuild states that it needs =dev-libs/glib-2*, that package should be |
15 |
> > compilable with every version of glib-2* in portage (taking into account |
16 |
> > keywords). Therefore collagen will install one version of glib-2*, then |
17 |
> > ebuild in question, collect information, uninstall ebuild and first |
18 |
> > glib version. If repeats this process for every glib-2* in the tree. |
19 |
> |
20 |
> Testing against every version of the deps as required seems like it is |
21 |
> diverging from the original "Tree-wide collision checking and provided |
22 |
> files database" - Would you say that the goal of this project is |
23 |
> becoming more QA orientated? Something like: "Matchbox: A tinderboxen |
24 |
> master server to provide QA for ebuilds" |
25 |
|
26 |
Yes that's true, this project is moving towards QA (it was proposed by |
27 |
QA developer after all). SoC list was only CCed, To: was to gentoo-qa. I |
28 |
should have warn in my original email so that responses wouldn't get |
29 |
lost between two lists. Added QA to cc. I believe crossposting is OK in |
30 |
this case. I apologize if that's not the case. |
31 |
|
32 |
This is from one of my discussions with weaver (my mentor): |
33 |
|
34 |
<quote mode=summary> |
35 |
say package X depends on libfoo, and there is libfoo-1.2.3 and libfoo-1.3.4 |
36 |
in portage, as well as libfoo-1.0.1 in the attic |
37 |
(http://sources.gentoo.org/viewcvs.py/gentoo-x86/ - view dead files) |
38 |
but the dev was a bit sloppy and didn't check with libfoo-1.2.3 and that |
39 |
version has a file collision with X, while libfoo-1.3.4 doesn't and X fails |
40 |
to compile with libfoo-1.0.1, even though some people might still have that |
41 |
on their systems so you have some build failures/collisions here that are |
42 |
QA problems which should be caught and can only be caught by iterating |
43 |
through all versions of the dependencies. |
44 |
</quote> |
45 |
|
46 |
|
47 |
|
48 |
> If you were strictly collision checking, then you don't care about |
49 |
> every version of glib-2* you only care about the package in question |
50 |
> and what installed files it provides. However for the provided files, |
51 |
> you do care about every version of glib-2*, not for the other package, |
52 |
> but to list the installed files of glib-2* |
53 |
|
54 |
Yes that's true. Strictly speaking collisions could be caught easily by |
55 |
compiling every package (usually with as many USE flags enabled as |
56 |
possible). However It would be nice to catch ebuilds that don't specify correct |
57 |
versions in DEPEND. Now that I think about it, it may be a good idea to |
58 |
allow matchbox to specify if tinderbox should try to compile against |
59 |
every version of dependencies or just one. By default we would only |
60 |
check against latest version, but specific packages could be set in a |
61 |
way to check every version of dependencies. |
62 |
|
63 |
|
64 |
> After writing that down, I can see why you want to compile, check, |
65 |
> uninstall, re-compile, repeat...but I worry about how efficient it is |
66 |
> and what ways to improve that. |
67 |
|
68 |
That's my concern too. That's why I wanted to use central binary host. |
69 |
Every compiled package could then be reused across all tinderboxes (with |
70 |
same architecture of course). |
71 |
I was counting on them being on high-speed network connection (ideally |
72 |
LAN). |
73 |
|
74 |
There is one more good thing about always starting with nothing but |
75 |
bare system in the beginning. If something is missing in DEPEND we will catch |
76 |
it easily. |
77 |
|
78 |
> > |
79 |
> > Original idea was to have two sides: |
80 |
> > * master server (matchbox) |
81 |
> > * slaves compiling packages (tinderboxes) |
82 |
> > |
83 |
> > Master server decides what needs to be compiled (automatically or |
84 |
> > semi-automatically). Tinderbox asks for job, master provides package |
85 |
> > name (and optionally version). Tinderbox then goes and tries to compile |
86 |
> > package with different sets of dependencies reporting results to |
87 |
> > Matchbox. |
88 |
> > |
89 |
> > It seems that whole process could be sped up by hosting binary |
90 |
> > packages on one central server (Binary host). Obviously various versions |
91 |
> > of the same package would be created and therefore unique names could be |
92 |
> > created by using some metadata to create hash part of filename. On a |
93 |
> > first thought I would use USE flags and DEPEND as metadata to hash. |
94 |
> |
95 |
> This is a cool aspect of the project, I hope you can work with solar |
96 |
> and zmedico to improve binpkgs. USE flags seem to be the trouble spot |
97 |
> of binpkgs. |
98 |
|
99 |
That's my other concern. I know that there was GSoC project to improve |
100 |
binary support in portage. Merging two project into one would not |
101 |
achieve much IMO. However I know for certain that to a certain degree I |
102 |
could make further work easier and I will do my best to do so. So make |
103 |
things as simple as possible, but not simpler :-) |
104 |
|
105 |
> > |
106 |
> > So far two other projects came to light as possible source of |
107 |
> > inspiration and/or collaboration: |
108 |
> > * catalyst (mainly tinderbox generating part) |
109 |
> > * AutotuA (automatic generic job framework) |
110 |
> > |
111 |
> > Especially AutotuA seems like good candidate for merging. |
112 |
> > |
113 |
> > It doesn't seem possible to compile every project with every version of |
114 |
> > every dependency, therefore I'd like to ask for your opinion especially |
115 |
> > about this part. One idea I had was to restrict testing to highest build |
116 |
> > number for given version. For example we have: |
117 |
> > glib-2.18.4-r1 and glib-2.18.4-r2, therefore we will only test against |
118 |
> > glib-2.18.4-r2 and will assume that r1 would be OK too (or users would |
119 |
> > upgrade since it's a bugfix release) |
120 |
> |
121 |
> IMO, you have two choices. Latest stable or latest ~arch. Stable users |
122 |
> will not upgrade from glib-2.18.4-r1 to -r2 until -r2 is stable so |
123 |
> that argument is out. |
124 |
|
125 |
I should have checked ebuilds before posting I guess :-) That was meant |
126 |
as an example. For the sake of argument consider they are both arch (not |
127 |
~arch). |
128 |
|
129 |
> > |
130 |
> > Another approach to optimizing use of resources would be to have a |
131 |
> > priority list of packages that need most testing. I imagine this could |
132 |
> > be created by analyzing logs from gentoo mirrors, and figuring out which |
133 |
> > packages are downloaded most frequently. |
134 |
> |
135 |
> Mirror log analysis is a fundamentally hard thing to do given the vast |
136 |
> network of mirrors that we have. |
137 |
|
138 |
Of course. It's not meant to be precise, just approximation to what |
139 |
packages are favourites. Even then I realize there will always be |
140 |
regional tendencies. Oh well...masses :-) |
141 |
|
142 |
|
143 |
> > |
144 |
> > We would probably need at least one tinderbox per glibc version if I am |
145 |
> > not mistaken since this cannot be freely up/downgraded. |
146 |
> |
147 |
> Its free to upgrade ;) Can't downgrade. Given how large the glibc |
148 |
> tracker bugs get, I don't think this project should use the latest |
149 |
> glibc available. Unless you are trying to hunt down bugs, but I think |
150 |
> you will get buried with compile failures. If the goal of this project |
151 |
> is to data mine the installed package's information, that is not |
152 |
> dependant on a glibc version. Please think about this some more before |
153 |
> going down that road, I want this project to be successful ;) |
154 |
|
155 |
|
156 |
Right. That was a suggestion for QA to think about. I would like to |
157 |
think about this as my opportunity to give something back to Gentoo |
158 |
after all these years as a user (taking, not giving much back). So it |
159 |
all boils down to what would Gentoo (QA team) need to improve Gentoo |
160 |
further. If it is enough to test latest stable glibc then that's how |
161 |
it's gonna be done. Maybe again make it possible to change behaviour in |
162 |
the future. |
163 |
|
164 |
-- |
165 |
Stanislav Ochotnicky |
166 |
Working for Gentoo Linux http://www.gentoo.org |
167 |
Implementing Tree-wide collision checking and provided files database |
168 |
http://soc.gentooexperimental.org/projects/show/collision-database |
169 |
Blog: http://inputvalidation.blogspot.com/search/label/gsoc |
170 |
|
171 |
|
172 |
jabber: sochotnicky@×××××.com |
173 |
icq: 74274152 |
174 |
PGP: https://dl.getdropbox.com/u/165616/sochotnicky-key.asc |