Gentoo Archives: gentoo-dev

From: "Paweł Hajdan
To: gentoo-dev@l.g.o
Subject: Re: [gentoo-dev] Gentoo package statistics -- GSoC 2011
Date: Wed, 08 Jun 2011 15:20:36
Message-Id: 4DEF9305.8080301@gentoo.org
In Reply to: [gentoo-dev] Gentoo package statistics -- GSoC 2011 by Vikraman
1 On 6/8/11 4:36 PM, Vikraman wrote:
2 > I'm working on the `Package statistics` project this year. Till now, I
3 > have managed to write a client and server[0] to collect the following
4 > information from hosts:
5
6 Excellent, good luck with the idea! I think that better information
7 about how Gentoo is actually used will greatly help improving it.
8
9 > Is there a need to collect files installed by a package ? Doesn't PFL[1]
10 > already provide that ?
11
12 Well, PFL is not an official Gentoo project. It might be useful, but I
13 wouldn't say it's a priority.
14
15 > Please provide some feedback on what other data should be collected, etc.
16
17 In my opinion it's *not* about collecting as much data as possible. I
18 think it's most important to get the core functionality working really
19 well, and convincing as large percentage of users as possible to enable
20 reporting the statistics (to make the results - hopefully - accurately
21 represent the user base). Please note that in some cases it may mean
22 collecting _less_ data, or thinking more about the privacy of the users.
23
24 For me, as a developer, even a list of packages sorted by popularity
25 (aka Debian/Ubuntu popcon) would be very useful.
26
27 Ah, and maybe files in /etc/portage: package.keywords and so on. It
28 could be useful to see what people are masking/unmasking, that may be an
29 indication of stale stabilizations or brokenness hitting the tree.
30 Anyway, I'd call it an enhancement.
31
32 > Also, I'm starting work on the webUI, and would like some
33 > recommendations for stats pages, such as:
34 >
35 > * Packages installed sorted by users
36
37 Cool!
38
39 > * Top arches, keywords, profiles
40
41 And percentage of ~arch vs arch users?
42
43 > * Most enabled, disabled useflags per package/globally
44
45 Also great, especially the per-package variant. It'd be also useful to
46 have per-profile data, to better tune the profile defaults.
47
48 > [0]
49 > http://git.overlays.gentoo.org/gitweb/?p=proj/gentoostats.git;a=commit;h=1b9697a090515d2a373e83b1094d6e08ec405c02
50
51 I took a quick look at the code. Some random comments:
52
53 - it uses portage Python API a lot. But it's not stable, or at least not
54 guaranteed to be stable. Have you considered using helpers like portageq
55 (or eventually enhancing those helpers)?
56
57 - make the licensing super-clear (a LICENSE file, possibly some header
58 in every source file, and so on)
59
60 - how about submitting the data over HTTPS and not HTTP to better help
61 privacy?
62
63 - don't leave exception handling as a TODO; it should be a part of your
64 design, not an afterthought
65
66 - instead of or in addition to the setup.txt file, how about just
67 writing the real setup.py file for distutils?

Attachments

File name MIME type
signature.asc application/pgp-signature

Replies

Subject Author
Re: [gentoo-dev] Gentoo package statistics -- GSoC 2011 Hans de Graaff <graaff@g.o>
Re: [gentoo-dev] Gentoo package statistics -- GSoC 2011 Vikraman <vikraman.choudhury@×××××.com>
Re: [gentoo-dev] Gentoo package statistics -- GSoC 2011 Donnie Berkholz <dberkholz@g.o>