On Fri, Jun 12, 2009 at 8:32 AM, Stanislav
Ochotnicky<sochotnicky@...> wrote:
> Hi everyone,
>
> some of you already know that work on GSoC project "Tree-wide collision
> checking and provided files database" has been started a few weeks ago.
> For the rest, I will make a short introduction and goals of this
> project (collagen).
>
> Collagen aims to improve quality of ebuilds in portage tree. It does
> this by compiling as many ebuilds as possible. It specifically takes
> into account various atoms in DEPEND variable. For example if package
> ebuild states that it needs =dev-libs/glib-2*, that package should be
> compilable with every version of glib-2* in portage (taking into account
> keywords). Therefore collagen will install one version of glib-2*, then
> ebuild in question, collect information, uninstall ebuild and first
> glib version. If repeats this process for every glib-2* in the tree.
Testing against every version of the deps as required seems like it is
diverging from the original "Tree-wide collision checking and provided
files database" - Would you say that the goal of this project is
becoming more QA orientated? Something like: "Matchbox: A tinderboxen
master server to provide QA for ebuilds"
If you were strictly collision checking, then you don't care about
every version of glib-2* you only care about the package in question
and what installed files it provides. However for the provided files,
you do care about every version of glib-2*, not for the other package,
but to list the installed files of glib-2*
After writing that down, I can see why you want to compile, check,
uninstall, re-compile, repeat...but I worry about how efficient it is
and what ways to improve that.
>
> Original idea was to have two sides:
> * master server (matchbox)
> * slaves compiling packages (tinderboxes)
>
> Master server decides what needs to be compiled (automatically or
> semi-automatically). Tinderbox asks for job, master provides package
> name (and optionally version). Tinderbox then goes and tries to compile
> package with different sets of dependencies reporting results to
> Matchbox.
>
> It seems that whole process could be sped up by hosting binary
> packages on one central server (Binary host). Obviously various versions
> of the same package would be created and therefore unique names could be
> created by using some metadata to create hash part of filename. On a
> first thought I would use USE flags and DEPEND as metadata to hash.
This is a cool aspect of the project, I hope you can work with solar
and zmedico to improve binpkgs. USE flags seem to be the trouble spot
of binpkgs.
>
> So far two other projects came to light as possible source of
> inspiration and/or collaboration:
> * catalyst (mainly tinderbox generating part)
> * AutotuA (automatic generic job framework)
>
> Especially AutotuA seems like good candidate for merging.
>
> It doesn't seem possible to compile every project with every version of
> every dependency, therefore I'd like to ask for your opinion especially
> about this part. One idea I had was to restrict testing to highest build
> number for given version. For example we have:
> glib-2.18.4-r1 and glib-2.18.4-r2, therefore we will only test against
> glib-2.18.4-r2 and will assume that r1 would be OK too (or users would
> upgrade since it's a bugfix release)
IMO, you have two choices. Latest stable or latest ~arch. Stable users
will not upgrade from glib-2.18.4-r1 to -r2 until -r2 is stable so
that argument is out.
>
> Another approach to optimizing use of resources would be to have a
> priority list of packages that need most testing. I imagine this could
> be created by analyzing logs from gentoo mirrors, and figuring out which
> packages are downloaded most frequently.
Mirror log analysis is a fundamentally hard thing to do given the vast
network of mirrors that we have.
>
> We would probably need at least one tinderbox per glibc version if I am
> not mistaken since this cannot be freely up/downgraded.
Its free to upgrade ;) Can't downgrade. Given how large the glibc
tracker bugs get, I don't think this project should use the latest
glibc available. Unless you are trying to hunt down bugs, but I think
you will get buried with compile failures. If the goal of this project
is to data mine the installed package's information, that is not
dependant on a glibc version. Please think about this some more before
going down that road, I want this project to be successful ;)
-Jeremy
> This email was meant just as a teaser, more information (data model, UML
> diagrams) is available on project website (look for Documents):
> http://soc.gentooexperimental.org/projects/show/collision-database
>
> I'd love to be hear some suggestions, opinions and criticism. You can
> use this thread, or even various options on gentooexperimental.org.
>
> --
> Stanislav Ochotnicky
> Working for Gentoo Linux http://www.gentoo.org
> Implementing Tree-wide collision checking and provided files database
> http://soc.gentooexperimental.org/projects/show/collision-database
> Blog: http://inputvalidation.blogspot.com/search/label/gsoc
>
>
> jabber: sochotnicky@...
> icq: 74274152
> PGP: https://dl.getdropbox.com/u/165616/sochotnicky-key.asc
>
|