1 |
Hello! |
2 |
|
3 |
|
4 |
Quick (re-)introduction: My task for Gentoo/Google Summer of Code 2009 |
5 |
is to give Gentoo a Debian popcon equivalent, a tool to collect |
6 |
statistics on "what package is installed how often". To achieve this |
7 |
goal I'm extending Smolt (a tool currently doing similar things with |
8 |
hardware information) by fine-tunable software stats gathering. |
9 |
|
10 |
|
11 |
The plan we have for Smolt is to make it cross-distro, not just fit |
12 |
Gentoo or Fedora. One point where the consequences and benefits of such |
13 |
an approach can be seen clearly is with |
14 |
|
15 |
counting packages from different distros into the same buckets. |
16 |
|
17 |
What do I mean by that? Debian's Git counts for Gentoo's Git counts for |
18 |
Fedora's, you know the list. With packages counted from accross distros |
19 |
we can suddenly answer questions that we currently cannot answer, among them |
20 |
|
21 |
- What globally popular packages are missing in distro X? |
22 |
Let's say we don't have a package for product P. Do other distros |
23 |
have one? They do, maybe we need one, too? They don't, maybe P is |
24 |
not that important then? |
25 |
|
26 |
- How many Linux users are approximately using program X in total? |
27 |
Not just on Ubuntu or Arch - all across Linux, BSD, Solaris! |
28 |
|
29 |
- Does distro X have 10 times the packages of Y or is it just |
30 |
different splitting? |
31 |
|
32 |
To count into the same bucket we use global identifiers for the |
33 |
"products" that fall out of a package. Gentoo package "dev-util/git" |
34 |
can produce product "cpe://a:git:git", Debian's "git-core" can, too. |
35 |
That string before is a CPE URI [1], a concept close to package naming |
36 |
in Java. This "intermediate language" allows us to relate package names |
37 |
from distro X with those of distro Y and answer various questions from |
38 |
that data. |
39 |
|
40 |
To do such mapping we need code (or a "service") that does the mapping |
41 |
for us and base of collected data that the service can operate on. Both |
42 |
of these is project "PackageMap" |
43 |
|
44 |
I have started populating the database with packages (currently 312 |
45 |
in number) made from information extracted from the Gentoo tree |
46 |
and the National Vulnerability Database. Latter holds many CPEs. |
47 |
Let me state clearly that packagemap is not about Gentoo in particular. |
48 |
Sure, the initial data has lots of Gentoo in it but the whole point of |
49 |
the project is to get information and people from different distros |
50 |
together. |
51 |
|
52 |
To see what these 312 packages maps look like at the moment you best do |
53 |
a few clicks through the database folder yourself: |
54 |
http://git.goodpoint.de/?p=packagemap.git;a=tree;f=database |
55 |
|
56 |
Also, there are Relax NG schema and DTD for validation, more |
57 |
documentation than I usually write and a few scripts: |
58 |
http://git.goodpoint.de/?p=packagemap.git;a=tree |
59 |
|
60 |
By now I hope you have gained interest in what this can become. |
61 |
Your active participation is highly appreciated. |
62 |
A few minutes from everyone can make a huge difference here. |
63 |
If you want write access to the repo - mail me: sebastian@×××××××.org. |
64 |
|
65 |
Please have a look at the Git repository linked above and ask questions. |
66 |
I propose to keep the related Gentoo stuff on gentoo-dev and everything |
67 |
else on the packagekit list. I hope that works out well. |
68 |
|
69 |
Thanks for reading up to this point. |
70 |
|
71 |
|
72 |
|
73 |
Sebastian |
74 |
|
75 |
|
76 |
|
77 |
PS: I'm aware "hartwork.org" might not make a good longterm location for |
78 |
DTDs, XML namespaces and such for a cross-distro project. Any ideas |
79 |
where to put them best? |
80 |
|
81 |
[1] http://cpe.mitre.org/ |