Gentoo Logo
Gentoo Spaceship

Installation:
Gentoo Handbook
Installation Docs

Documentation:
Home
Listing
About Gentoo
Philosophy
Social Contract

Resources:
Bug Tracker
Developer List
Discussion Forums
Gentoo BitTorrents
Gentoo Linux Enhancement Proposals
IRC Channels
Mailing Lists
Mirrors
Name and Logo Guidelines
Online Package Database
Security Announcements
Staffing Needs
Supporting Vendors
View our CVS

Graphics:
Logos and themes
Icons
ScreenShots

Miscellaneous Resources:
Gentoo Linux Store
Gentoo-hosted projects
IBM dW/Intel article archive




List Archive: gentoo-soc
Navigation:
Lists: gentoo-soc: < Prev By Thread Next > < Prev By Date Next >
Headers:
To: gentoo-soc@g.o
From: Eitan Mosenkis <eitan@...>
Subject: Re: [GSoC-status] Tree-wide collision checking and files database
Date: Fri, 12 Jun 2009 12:28:09 -0400
I wonder... if you're going to be churning out a bunch of binpkgs, my
project (web-based system image generator) could almost certainly put
them to use.  As for USE flags, perhaps try enabling all flags -
you'll never be able to do every combination of flags or every
individual flag on and turning them all on would be a quick and dirty
guess at how to end up with the largest set of files, which is what
you want for collision checking.  You could also try having two builds
- one with all set and one with none set, which would probably get you
a few files in the list that you'd miss otherwise.  Still, you'll
probably run into occasional problems with packages where portage will
tell you you have to change your USE flags to make something install.

On Fri, Jun 12, 2009 at 11:05 AM, Jeremy Olexa<darkside@g.o> wrote:
> On Fri, Jun 12, 2009 at 8:32 AM, Stanislav
> Ochotnicky<sochotnicky@...> wrote:
>> Hi everyone,
>>
>> some of you already know that work on GSoC project "Tree-wide collision
>> checking and provided files database" has been started a few weeks ago.
>> For the rest, I will make a short introduction and goals of this
>> project (collagen).
>>
>> Collagen aims to improve quality of ebuilds in portage tree. It does
>> this by compiling as many ebuilds as possible. It specifically takes
>> into account various atoms in DEPEND variable. For example if package
>> ebuild states that it needs =dev-libs/glib-2*, that package should be
>> compilable with every version of glib-2* in portage (taking into account
>> keywords). Therefore collagen will install one version of glib-2*, then
>> ebuild in question, collect information, uninstall ebuild and first
>> glib version. If repeats this process for every glib-2* in the tree.
>
> Testing against every version of the deps as required seems like it is
> diverging from the original "Tree-wide collision checking and provided
> files database" - Would you say that the goal of this project is
> becoming more QA orientated? Something like: "Matchbox: A tinderboxen
> master server to provide QA for ebuilds"
>
> If you were strictly collision checking, then you don't care about
> every version of glib-2* you only care about the package in question
> and what installed files it provides. However for the provided files,
> you do care about every version of glib-2*, not for the other package,
> but to list the installed files of glib-2*
>
> After writing that down, I can see why you want to compile, check,
> uninstall, re-compile, repeat...but I worry about how efficient it is
> and what ways to improve that.
>
>>
>> Original idea was to have two sides:
>>  * master server (matchbox)
>>  * slaves compiling packages (tinderboxes)
>>
>> Master server decides what needs to be compiled (automatically or
>> semi-automatically). Tinderbox asks for job, master provides package
>> name (and optionally version). Tinderbox then goes and tries to compile
>> package with different sets of dependencies reporting results to
>> Matchbox.
>>
>> It seems that whole process could be sped up by hosting binary
>> packages on one central server (Binary host). Obviously various versions
>> of the same package would be created and therefore unique names could be
>> created by using some metadata to create hash part of filename. On a
>> first thought I would use USE flags and DEPEND as metadata to hash.
>
> This is a cool aspect of the project, I hope you can work with solar
> and zmedico to improve binpkgs. USE flags seem to be the trouble spot
> of binpkgs.
>
>>
>> So far two other projects came to light as possible source of
>> inspiration and/or collaboration:
>>  * catalyst (mainly tinderbox generating part)
>>  * AutotuA (automatic generic job framework)
>>
>> Especially AutotuA seems like good candidate for merging.
>>
>> It doesn't seem possible to compile every project with every version of
>> every dependency, therefore I'd like to ask for your opinion especially
>> about this part. One idea I had was to restrict testing to highest build
>> number for given version. For example we have:
>> glib-2.18.4-r1 and glib-2.18.4-r2, therefore we will only test against
>> glib-2.18.4-r2 and will assume that r1 would be OK too (or users would
>> upgrade since it's a bugfix release)
>
> IMO, you have two choices. Latest stable or latest ~arch. Stable users
> will not upgrade from glib-2.18.4-r1 to -r2 until -r2 is stable so
> that argument is out.
>
>>
>> Another approach to optimizing use of resources would be to have a
>> priority list of packages that need most testing. I imagine this could
>> be created by analyzing logs from gentoo mirrors, and figuring out which
>> packages are downloaded most frequently.
>
> Mirror log analysis is a fundamentally hard thing to do given the vast
> network of mirrors that we have.
>
>>
>> We would probably need at least one tinderbox per glibc version if I am
>> not mistaken since this cannot be freely up/downgraded.
>
> Its free to upgrade ;) Can't downgrade. Given how large the glibc
> tracker bugs get, I don't think this project should use the latest
> glibc available. Unless you are trying to hunt down bugs, but I think
> you will get buried with compile failures. If the goal of this project
> is to data mine the installed package's information, that is not
> dependant on a glibc version. Please think about this some more before
> going down that road, I want this project to be successful ;)
>
> -Jeremy
>
>
>> This email was meant just as a teaser, more information (data model, UML
>> diagrams) is available on project website (look for Documents):
>> http://soc.gentooexperimental.org/projects/show/collision-database
>>
>> I'd love to be hear some suggestions, opinions and criticism. You can
>> use this thread, or even various options on gentooexperimental.org.
>>
>> --
>> Stanislav Ochotnicky
>> Working for Gentoo Linux http://www.gentoo.org
>> Implementing Tree-wide collision checking and provided files database
>> http://soc.gentooexperimental.org/projects/show/collision-database
>> Blog: http://inputvalidation.blogspot.com/search/label/gsoc
>>
>>
>> jabber: sochotnicky@...
>> icq: 74274152
>> PGP: https://dl.getdropbox.com/u/165616/sochotnicky-key.asc
>>
>
>


References:
[GSoC-status] Tree-wide collision checking and files database
-- Stanislav Ochotnicky
Re: [GSoC-status] Tree-wide collision checking and files database
-- Jeremy Olexa
Navigation:
Lists: gentoo-soc: < Prev By Thread Next > < Prev By Date Next >
Previous by thread:
Re: [GSoC-status] Tree-wide collision checking and files database
Next by thread:
Re: [GSoC-status] Tree-wide collision checking and files database
Previous by date:
Re: [GSoC-status] Tree-wide collision checking and files database
Next by date:
Re: [GSoC-status] Tree-wide collision checking and files database


Updated Jun 17, 2009

Donate to support our development efforts.

Gentoo Centric Hosting: vr.org

VR Hosted

Tek Alchemy

Tek Alchemy

SevenL.net

SevenL.net

php|architect

php|architect

Copyright 2001-2007 Gentoo Foundation, Inc. Questions, Comments? Email www@gentoo.org.