1 |
On Thu, May 19, 2011 at 10:52 AM, Vikraman <vikraman.choudhury@×××××.com>wrote: |
2 |
|
3 |
> |
4 |
> Hello everyone, |
5 |
> |
6 |
> I'm working on the 'Package Statistics' project [1] for GSoC this summer. |
7 |
> This is my first progress report. |
8 |
> |
9 |
> |
10 |
> A short summary of my progress: |
11 |
> ------------------------------- |
12 |
> |
13 |
> * Created project repository [2] on git.overlays.gentoo.org |
14 |
> * Read up on RESTful Web Services (from the O'reilly book) |
15 |
> * Tried to improve my python coding style using the google guide |
16 |
> (suggested by Alec) |
17 |
> * Wrote a simple client in python to collect a few environment variables |
18 |
> from portage, list of installed packages with useflags, encode the |
19 |
> data in JSON, with proper authentication, and issue a POST to the |
20 |
> server |
21 |
> |
22 |
|
23 |
|
24 |
Depending how much detail you want to collect, the enalyze modules in |
25 |
gentoolkit have USE flag filtering to remove the normally hidden flags that |
26 |
users never see. |
27 |
|
28 |
(note: enalyze is the new "e" name for the analyse modules) |
29 |
|
30 |
|
31 |
* Wrote a simple webapp using web.py to handle requests from the above |
32 |
> client and save the data to MySQL tables |
33 |
> * Wrote some documentation to deploy the webapp and client |
34 |
> |
35 |
> Issues encountered: |
36 |
> ------------------- |
37 |
> |
38 |
> * Choice of portage api vs gentoolkit api: The gentoolkit api is very |
39 |
> easy to use, but quite slow [3] compared to the portage api. Alec asked me |
40 |
> to use both of them as necessary, but provide an easy way to swap out |
41 |
> one in favor of the other at a later time. |
42 |
> |
43 |
|
44 |
|
45 |
Yes, that part of gentoolkit is slower than using portage directly as it |
46 |
does other things than just return the installed cpv list. |
47 |
|
48 |
1) it has the ability to filter the list using a caller supplied function. |
49 |
|
50 |
2) it is a generator and doesn't perform the actual read/load immediately, |
51 |
which saves time for areas of code that may not end up using it, but is |
52 |
loaded. |
53 |
|
54 |
That is one reason I did not use it initially in the enalyze modules, but |
55 |
added support to optionally use it (at someone else's insistence) and the |
56 |
gentoolkit package object. |
57 |
|
58 |
Most of what information you will want for stats gathering should be taken |
59 |
from the enalyze modules. They are specific to the installed pkg db, are as |
60 |
a result will tend to be faster than the other gentoolkit functions. If you |
61 |
look at them closer you will see that they accept a pre-loaded cpv list |
62 |
(using the same portage-api) for running different reports in succession. |
63 |
As I informed you during the proposal period, the enalyze modules are |
64 |
nearly tailor made for the information gathering your stats project will be |
65 |
needing. |
66 |
|
67 |
If there is some data functions missing from it, I can probably add them in |
68 |
for you, since I will be adding more reports and functionality to enalyze. |
69 |
|
70 |
Another thing to keep in mind is that if the code used to gather the |
71 |
information and report it, is part of gentoolkit. It is far more likely to |
72 |
be accepted and used to report to the server. While I don't know the |
73 |
numbers (something a successful result of this project would tell) I believe |
74 |
the majority of gentoo users have it installed. I believe that all the |
75 |
previous stats projects have not succeeded fully is that there was an |
76 |
additional pkg to install that most people did not even know about. |
77 |
|
78 |
|
79 |
Also I am working on the portage public_api, so if there is some special |
80 |
functionality that you need from portage, if it is suitable for that api, I |
81 |
can add it in. |
82 |
|
83 |
|
84 |
|
85 |
|
86 |
> |
87 |
> * Get updates from the community on what data should be collected from |
88 |
> hosts |
89 |
> * Try to add more fields to the client/server and modify the SQL tables |
90 |
> accordingly |
91 |
> * Learn more about the portage api and discuss them on #gentoo-portage |
92 |
> |
93 |
> My semester exams are currently in progress, and they'll last till the |
94 |
> end of May. So, I'll not be able to work during these 2 weeks. I'm |
95 |
> looking forward to get back on 1st June, and continue with my project. |
96 |
> |
97 |
> |
98 |
> -- |
99 |
> Vikraman |
100 |
> |
101 |
|
102 |
|
103 |
Brian Dolbec |