1 |
On Fri, Jun 12, 2009 at 8:32 AM, Stanislav |
2 |
Ochotnicky<sochotnicky@×××××.com> wrote: |
3 |
> Hi everyone, |
4 |
> |
5 |
> some of you already know that work on GSoC project "Tree-wide collision |
6 |
> checking and provided files database" has been started a few weeks ago. |
7 |
> For the rest, I will make a short introduction and goals of this |
8 |
> project (collagen). |
9 |
> |
10 |
> Collagen aims to improve quality of ebuilds in portage tree. It does |
11 |
> this by compiling as many ebuilds as possible. It specifically takes |
12 |
> into account various atoms in DEPEND variable. For example if package |
13 |
> ebuild states that it needs =dev-libs/glib-2*, that package should be |
14 |
> compilable with every version of glib-2* in portage (taking into account |
15 |
> keywords). Therefore collagen will install one version of glib-2*, then |
16 |
> ebuild in question, collect information, uninstall ebuild and first |
17 |
> glib version. If repeats this process for every glib-2* in the tree. |
18 |
|
19 |
Testing against every version of the deps as required seems like it is |
20 |
diverging from the original "Tree-wide collision checking and provided |
21 |
files database" - Would you say that the goal of this project is |
22 |
becoming more QA orientated? Something like: "Matchbox: A tinderboxen |
23 |
master server to provide QA for ebuilds" |
24 |
|
25 |
If you were strictly collision checking, then you don't care about |
26 |
every version of glib-2* you only care about the package in question |
27 |
and what installed files it provides. However for the provided files, |
28 |
you do care about every version of glib-2*, not for the other package, |
29 |
but to list the installed files of glib-2* |
30 |
|
31 |
After writing that down, I can see why you want to compile, check, |
32 |
uninstall, re-compile, repeat...but I worry about how efficient it is |
33 |
and what ways to improve that. |
34 |
|
35 |
> |
36 |
> Original idea was to have two sides: |
37 |
> * master server (matchbox) |
38 |
> * slaves compiling packages (tinderboxes) |
39 |
> |
40 |
> Master server decides what needs to be compiled (automatically or |
41 |
> semi-automatically). Tinderbox asks for job, master provides package |
42 |
> name (and optionally version). Tinderbox then goes and tries to compile |
43 |
> package with different sets of dependencies reporting results to |
44 |
> Matchbox. |
45 |
> |
46 |
> It seems that whole process could be sped up by hosting binary |
47 |
> packages on one central server (Binary host). Obviously various versions |
48 |
> of the same package would be created and therefore unique names could be |
49 |
> created by using some metadata to create hash part of filename. On a |
50 |
> first thought I would use USE flags and DEPEND as metadata to hash. |
51 |
|
52 |
This is a cool aspect of the project, I hope you can work with solar |
53 |
and zmedico to improve binpkgs. USE flags seem to be the trouble spot |
54 |
of binpkgs. |
55 |
|
56 |
> |
57 |
> So far two other projects came to light as possible source of |
58 |
> inspiration and/or collaboration: |
59 |
> * catalyst (mainly tinderbox generating part) |
60 |
> * AutotuA (automatic generic job framework) |
61 |
> |
62 |
> Especially AutotuA seems like good candidate for merging. |
63 |
> |
64 |
> It doesn't seem possible to compile every project with every version of |
65 |
> every dependency, therefore I'd like to ask for your opinion especially |
66 |
> about this part. One idea I had was to restrict testing to highest build |
67 |
> number for given version. For example we have: |
68 |
> glib-2.18.4-r1 and glib-2.18.4-r2, therefore we will only test against |
69 |
> glib-2.18.4-r2 and will assume that r1 would be OK too (or users would |
70 |
> upgrade since it's a bugfix release) |
71 |
|
72 |
IMO, you have two choices. Latest stable or latest ~arch. Stable users |
73 |
will not upgrade from glib-2.18.4-r1 to -r2 until -r2 is stable so |
74 |
that argument is out. |
75 |
|
76 |
> |
77 |
> Another approach to optimizing use of resources would be to have a |
78 |
> priority list of packages that need most testing. I imagine this could |
79 |
> be created by analyzing logs from gentoo mirrors, and figuring out which |
80 |
> packages are downloaded most frequently. |
81 |
|
82 |
Mirror log analysis is a fundamentally hard thing to do given the vast |
83 |
network of mirrors that we have. |
84 |
|
85 |
> |
86 |
> We would probably need at least one tinderbox per glibc version if I am |
87 |
> not mistaken since this cannot be freely up/downgraded. |
88 |
|
89 |
Its free to upgrade ;) Can't downgrade. Given how large the glibc |
90 |
tracker bugs get, I don't think this project should use the latest |
91 |
glibc available. Unless you are trying to hunt down bugs, but I think |
92 |
you will get buried with compile failures. If the goal of this project |
93 |
is to data mine the installed package's information, that is not |
94 |
dependant on a glibc version. Please think about this some more before |
95 |
going down that road, I want this project to be successful ;) |
96 |
|
97 |
-Jeremy |
98 |
|
99 |
|
100 |
> This email was meant just as a teaser, more information (data model, UML |
101 |
> diagrams) is available on project website (look for Documents): |
102 |
> http://soc.gentooexperimental.org/projects/show/collision-database |
103 |
> |
104 |
> I'd love to be hear some suggestions, opinions and criticism. You can |
105 |
> use this thread, or even various options on gentooexperimental.org. |
106 |
> |
107 |
> -- |
108 |
> Stanislav Ochotnicky |
109 |
> Working for Gentoo Linux http://www.gentoo.org |
110 |
> Implementing Tree-wide collision checking and provided files database |
111 |
> http://soc.gentooexperimental.org/projects/show/collision-database |
112 |
> Blog: http://inputvalidation.blogspot.com/search/label/gsoc |
113 |
> |
114 |
> |
115 |
> jabber: sochotnicky@×××××.com |
116 |
> icq: 74274152 |
117 |
> PGP: https://dl.getdropbox.com/u/165616/sochotnicky-key.asc |
118 |
> |