1 |
This is a weekly progress report no. 1 for Project Grumpy. |
2 |
|
3 |
As this is the first publicly visible announcement, I am also going to |
4 |
give a short overview about the project itself. |
5 |
|
6 |
The aim of this project is to create a database containing various |
7 |
developer-related metadata about packages in the Gentoo portage. |
8 |
Metadata that we are going to store can be used for different kinds of |
9 |
purposes, some examples include upstream version checks and giving |
10 |
notifications to developers who are interested about that package. And |
11 |
eventually provide a nice web and API interface to access this data. |
12 |
|
13 |
Project's semi-official IRC channel is #gentoo-grumpy on Freenode |
14 |
network. Just step in say "Hi!" :) |
15 |
|
16 |
Last week's progress report |
17 |
=========================== |
18 |
|
19 |
My first week went a bit slowly due to having some "unfinished business" |
20 |
that I needed to finish, and also because of two exams (which went |
21 |
fine). |
22 |
|
23 |
The core issue I wrestled during this week was how to keep portage |
24 |
contents and database contents in sync - ie. when ebuild is modified, |
25 |
removed or added, how to make sure that database contents correspond to |
26 |
the portage contents. |
27 |
|
28 |
The solution that I came up with is to use a simple daemon that logs |
29 |
changes to portage tree and modifies database contents when it's |
30 |
appropriate. Appropriate here means that we shouldn't log updates during |
31 |
the update of the tree as it might be unsafe (ie package rename). So |
32 |
currently it seems that daemon has also initiate the rsync progress and |
33 |
push the updates into database after rsync has finished successfully. |
34 |
(You can already see how all kinds of weird corner cases start popping |
35 |
up :P ) |
36 |
|
37 |
My current approach to logging is using the inotify [1] framework |
38 |
present in Linux kernel since 2.6.13 (sorry BSD users, but this is |
39 |
Gentoo Linux afterall) with the help of pyinotify [2]. |
40 |
So far there's only one drawback to using inotify - by default kernel |
41 |
has a limit of 8192 directory watches allowed per-process (but portage |
42 |
contains a lots of directories) so in order to use that approach one has |
43 |
to bump the number watches using /proc/sys/fs/inotify/max_user_watches |
44 |
tunable. 81920 has worked so far fine on my machine ;) |
45 |
|
46 |
There was also a secondary approach suggested by my mentor Leio to parse |
47 |
rsync log files, but I am a bit relucant about this idea. |
48 |
|
49 |
Anyway, I'll leave this idea simmering here for a while and unless |
50 |
someone comes up with a better idea (Yes, I have also thought about |
51 |
scanning whole portage tree every x-hours), I'm going to implement the |
52 |
daemon. |
53 |
|
54 |
Plans for current week |
55 |
====================== |
56 |
|
57 |
As I currently consider the core issue solved, the next issue I have to |
58 |
solve is how to take an ebuild, extract information about it and store |
59 |
it in database. (Hint: pkgcore) |
60 |
|
61 |
I'm not going take bigger tasks because I still have one quite hard exam |
62 |
(thermodynamics and statistical physics) on 4th of June. And if I pass, |
63 |
it is the last one. |
64 |
|
65 |
PS. Sorry, no blog yet. I was using Zine, but it broke after I updated |
66 |
my system to SQLAlchemy-0.6. |
67 |
|
68 |
[1] http://en.wikipedia.org/wiki/Inotify |
69 |
[2] http://trac.dbzteam.org/pyinotify |
70 |
|
71 |
Päikest, |
72 |
Priit Laes :) |