Gentoo Archives: gentoo-dev

From: "Paweł Hajdan
To: gentoo-dev@l.g.o
Subject: Re: [gentoo-dev] Gentoo package statistics -- GSoC 2011
Date: Wed, 08 Jun 2011 15:20:36
Message-Id: 4DEF9305.8080301@gentoo.org
In Reply to: [gentoo-dev] Gentoo package statistics -- GSoC 2011 by Vikraman
On 6/8/11 4:36 PM, Vikraman wrote:
> I'm working on the `Package statistics` project this year. Till now, I > have managed to write a client and server[0] to collect the following > information from hosts:
Excellent, good luck with the idea! I think that better information about how Gentoo is actually used will greatly help improving it.
> Is there a need to collect files installed by a package ? Doesn't PFL[1] > already provide that ?
Well, PFL is not an official Gentoo project. It might be useful, but I wouldn't say it's a priority.
> Please provide some feedback on what other data should be collected, etc.
In my opinion it's *not* about collecting as much data as possible. I think it's most important to get the core functionality working really well, and convincing as large percentage of users as possible to enable reporting the statistics (to make the results - hopefully - accurately represent the user base). Please note that in some cases it may mean collecting _less_ data, or thinking more about the privacy of the users. For me, as a developer, even a list of packages sorted by popularity (aka Debian/Ubuntu popcon) would be very useful. Ah, and maybe files in /etc/portage: package.keywords and so on. It could be useful to see what people are masking/unmasking, that may be an indication of stale stabilizations or brokenness hitting the tree. Anyway, I'd call it an enhancement.
> Also, I'm starting work on the webUI, and would like some > recommendations for stats pages, such as: > > * Packages installed sorted by users
Cool!
> * Top arches, keywords, profiles
And percentage of ~arch vs arch users?
> * Most enabled, disabled useflags per package/globally
Also great, especially the per-package variant. It'd be also useful to have per-profile data, to better tune the profile defaults. I took a quick look at the code. Some random comments: - it uses portage Python API a lot. But it's not stable, or at least not guaranteed to be stable. Have you considered using helpers like portageq (or eventually enhancing those helpers)? - make the licensing super-clear (a LICENSE file, possibly some header in every source file, and so on) - how about submitting the data over HTTPS and not HTTP to better help privacy? - don't leave exception handling as a TODO; it should be a part of your design, not an afterthought - instead of or in addition to the setup.txt file, how about just writing the real setup.py file for distutils?

Attachments

File name MIME type
signature.asc application/pgp-signature

Replies

Subject Author
Re: [gentoo-dev] Gentoo package statistics -- GSoC 2011 Vikraman <vikraman.choudhury@×××××.com>
Re: [gentoo-dev] Gentoo package statistics -- GSoC 2011 Donnie Berkholz <dberkholz@g.o>
Re: [gentoo-dev] Gentoo package statistics -- GSoC 2011 Hans de Graaff <graaff@g.o>