1 |
This is a weekly progress report no. 2 for Project Grumpy. |
2 |
|
3 |
As reported previously, I am building a system to index portage packages |
4 |
and related metadata to make package maintainership a bit easier for |
5 |
developers. |
6 |
|
7 |
First, a few words about the document metadata storage. For this project, the |
8 |
plan is to use a document-oriented and schema-free database (MongoDB) instead |
9 |
of a regular relational database system (like SQLite or PostgreSQL). |
10 |
|
11 |
This also means that we can create a single document collection, where |
12 |
documents correspond to simply "category/package" and collection containing |
13 |
whole ebuild tree. |
14 |
|
15 |
Document itself in the collection, is just a JSON-formatted dictionary with |
16 |
following structure (beware, this is work in progress, so some things are |
17 |
still missing):: |
18 |
|
19 |
{ |
20 |
# "package/category" (primary index, unique) |
21 |
'_id' : string, |
22 |
|
23 |
# Version of the schema, used internally (just in case) |
24 |
'schema_ver' : integer, |
25 |
|
26 |
# Package category |
27 |
'cat' : string, |
28 |
|
29 |
# Package name |
30 |
'pkg' : string, |
31 |
|
32 |
## Data from metadata.xml |
33 |
# List of herds maintaining this package |
34 |
'herds' : [ string, ... ], |
35 |
# Long description of the package |
36 |
'ldesc' : string, |
37 |
# List of maintainers (by email addresses) |
38 |
'maintainers' : [ string, ... ], |
39 |
|
40 |
## Data from ebuilds itself (but should be general) |
41 |
# Description |
42 |
"desc" : string, |
43 |
# Upstream url(s) (FIXME: Do we need list here?) |
44 |
'homepage' : string, |
45 |
|
46 |
# Array of all the package versions and their specific info |
47 |
'ebuilds' : [ |
48 |
# Package version (from category/package-version) |
49 |
'version' : string, |
50 |
|
51 |
# Eapi version |
52 |
"eapi" : integer, |
53 |
# List of USE flags supported by this ebuild |
54 |
'iuse' : [ string, ... ], |
55 |
# Package keywords ("x86", "~amd64", ...) |
56 |
'keywords : [ string, ... ], |
57 |
# Licenses |
58 |
'licence' : [ string, ... ], |
59 |
# Package slot |
60 |
'slot' : string, |
61 |
|
62 |
# Need to figure out proper structure for these, so we can also |
63 |
# map out USE flags ;) |
64 |
'depend' : TODO!!! |
65 |
'rdepend' : TODO!!! |
66 |
] |
67 |
} |
68 |
|
69 |
So how about querying the data? That's easy. (Please note we are using MongoDB |
70 |
shell). So, what if a developer wants to know which packages he is supposedly |
71 |
maintaining:: |
72 |
|
73 |
> db.ebuilds.find({'maintainers' : '...@g.o' }) |
74 |
{... document data ...} # (Too much info :) ) |
75 |
> db.ebuilds.find({'maintainers' : '...@g.o' }).count() |
76 |
7 |
77 |
|
78 |
And the results come fast. I mean really fast. |
79 |
Ok, how about checking how many packages under 'dev-python' are using specific |
80 |
EAPI version:: |
81 |
|
82 |
> db.ebuilds.find({'cat' : 'dev-python', 'ebuilds.eapi' : 0}).count() |
83 |
202 |
84 |
> db.ebuilds.find({'cat' : 'dev-python', 'ebuilds.eapi' : 1}).count() |
85 |
3 |
86 |
> db.ebuilds.find({'cat' : 'dev-python', 'ebuilds.eapi' : 2}).count() |
87 |
255 |
88 |
> db.ebuilds.find({'cat' : 'dev-python', 'ebuilds.eapi' : 3}).count() |
89 |
125 |
90 |
> db.ebuilds.find({'cat' : 'dev-python', 'ebuilds.eapi' : 4}).count() |
91 |
0 |
92 |
> db.ebuilds.find({'cat' : 'dev-python' }).count() |
93 |
504 |
94 |
> 202+3+255+125 - 504 |
95 |
81 |
96 |
|
97 |
Ahem.. looks like we have a "design issue" with our document structure. So |
98 |
back to the drawing board. |
99 |
|
100 |
Last week's progress report |
101 |
=========================== |
102 |
|
103 |
Last week's progress has been a bit slow, I have mostly played with document |
104 |
structure and played a bit with pkgcore's internals. Although I now have |
105 |
portage contents inside the database the document structure itself is far from |
106 |
ideal (as you can see from the example with EAPI counts given earlier). |
107 |
|
108 |
I have committed some of the stuff I have been working on into Grumpy's repo, |
109 |
so in case you are interested check it out from [1]. |
110 |
|
111 |
[1] http://git.overlays.gentoo.org/gitweb/?p=proj/grumpy.git;a=summary |
112 |
|
113 |
First a warning, the portage->mongodb syncer is slow. I mean really slow - it |
114 |
takes about 3 hours (or even more) on my laptop to fully scan the contents of |
115 |
portage and store the data in database. |
116 |
|
117 |
Plans for current week |
118 |
====================== |
119 |
|
120 |
1) Speed up the portage syncer |
121 |
2) Improve document structure |
122 |
|
123 |
Päikest, |
124 |
Priit :) |