Gentoo Archives: gentoo-soc

From: Brian Dolbec <brian.dolbec@×××××.com>
To: gentoo-soc@l.g.o
Subject: Re: [gentoo-soc] Package statistics, progress report #1
Date: Thu, 19 May 2011 14:53:34
Message-Id: BANLkTikGN0+uoRdHSLTMTvZ=9T6MEWS0xA@mail.gmail.com
In Reply to: [gentoo-soc] Package statistics, progress report #1 by Vikraman
1 On Thu, May 19, 2011 at 10:52 AM, Vikraman <vikraman.choudhury@×××××.com>wrote:
2
3 >
4 > Hello everyone,
5 >
6 > I'm working on the 'Package Statistics' project [1] for GSoC this summer.
7 > This is my first progress report.
8 >
9 >
10 > A short summary of my progress:
11 > -------------------------------
12 >
13 > * Created project repository [2] on git.overlays.gentoo.org
14 > * Read up on RESTful Web Services (from the O'reilly book)
15 > * Tried to improve my python coding style using the google guide
16 > (suggested by Alec)
17 > * Wrote a simple client in python to collect a few environment variables
18 > from portage, list of installed packages with useflags, encode the
19 > data in JSON, with proper authentication, and issue a POST to the
20 > server
21 >
22
23
24 Depending how much detail you want to collect, the enalyze modules in
25 gentoolkit have USE flag filtering to remove the normally hidden flags that
26 users never see.
27
28 (note: enalyze is the new "e" name for the analyse modules)
29
30
31 * Wrote a simple webapp using web.py to handle requests from the above
32 > client and save the data to MySQL tables
33 > * Wrote some documentation to deploy the webapp and client
34 >
35 > Issues encountered:
36 > -------------------
37 >
38 > * Choice of portage api vs gentoolkit api: The gentoolkit api is very
39 > easy to use, but quite slow [3] compared to the portage api. Alec asked me
40 > to use both of them as necessary, but provide an easy way to swap out
41 > one in favor of the other at a later time.
42 >
43
44
45 Yes, that part of gentoolkit is slower than using portage directly as it
46 does other things than just return the installed cpv list.
47
48 1) it has the ability to filter the list using a caller supplied function.
49
50 2) it is a generator and doesn't perform the actual read/load immediately,
51 which saves time for areas of code that may not end up using it, but is
52 loaded.
53
54 That is one reason I did not use it initially in the enalyze modules, but
55 added support to optionally use it (at someone else's insistence) and the
56 gentoolkit package object.
57
58 Most of what information you will want for stats gathering should be taken
59 from the enalyze modules. They are specific to the installed pkg db, are as
60 a result will tend to be faster than the other gentoolkit functions. If you
61 look at them closer you will see that they accept a pre-loaded cpv list
62 (using the same portage-api) for running different reports in succession.
63 As I informed you during the proposal period, the enalyze modules are
64 nearly tailor made for the information gathering your stats project will be
65 needing.
66
67 If there is some data functions missing from it, I can probably add them in
68 for you, since I will be adding more reports and functionality to enalyze.
69
70 Another thing to keep in mind is that if the code used to gather the
71 information and report it, is part of gentoolkit. It is far more likely to
72 be accepted and used to report to the server. While I don't know the
73 numbers (something a successful result of this project would tell) I believe
74 the majority of gentoo users have it installed. I believe that all the
75 previous stats projects have not succeeded fully is that there was an
76 additional pkg to install that most people did not even know about.
77
78
79 Also I am working on the portage public_api, so if there is some special
80 functionality that you need from portage, if it is suitable for that api, I
81 can add it in.
82
83
84
85
86 >
87 > * Get updates from the community on what data should be collected from
88 > hosts
89 > * Try to add more fields to the client/server and modify the SQL tables
90 > accordingly
91 > * Learn more about the portage api and discuss them on #gentoo-portage
92 >
93 > My semester exams are currently in progress, and they'll last till the
94 > end of May. So, I'll not be able to work during these 2 weeks. I'm
95 > looking forward to get back on 1st June, and continue with my project.
96 >
97 >
98 > --
99 > Vikraman
100 >
101
102
103 Brian Dolbec