1 |
Also, good luck with thermodynamics;) |
2 |
|
3 |
On Tue, 2010-06-01 at 10:10 +0300, Priit Laes wrote: |
4 |
> This is a weekly progress report no. 1 for Project Grumpy. |
5 |
> |
6 |
> As this is the first publicly visible announcement, I am also going to |
7 |
> give a short overview about the project itself. |
8 |
> |
9 |
> The aim of this project is to create a database containing various |
10 |
> developer-related metadata about packages in the Gentoo portage. |
11 |
> Metadata that we are going to store can be used for different kinds of |
12 |
> purposes, some examples include upstream version checks and giving |
13 |
> notifications to developers who are interested about that package. And |
14 |
> eventually provide a nice web and API interface to access this data. |
15 |
> |
16 |
> Project's semi-official IRC channel is #gentoo-grumpy on Freenode |
17 |
> network. Just step in say "Hi!" :) |
18 |
> |
19 |
> Last week's progress report |
20 |
> =========================== |
21 |
> |
22 |
> My first week went a bit slowly due to having some "unfinished business" |
23 |
> that I needed to finish, and also because of two exams (which went |
24 |
> fine). |
25 |
> |
26 |
> The core issue I wrestled during this week was how to keep portage |
27 |
> contents and database contents in sync - ie. when ebuild is modified, |
28 |
> removed or added, how to make sure that database contents correspond to |
29 |
> the portage contents. |
30 |
> |
31 |
> The solution that I came up with is to use a simple daemon that logs |
32 |
> changes to portage tree and modifies database contents when it's |
33 |
> appropriate. Appropriate here means that we shouldn't log updates during |
34 |
> the update of the tree as it might be unsafe (ie package rename). So |
35 |
> currently it seems that daemon has also initiate the rsync progress and |
36 |
> push the updates into database after rsync has finished successfully. |
37 |
> (You can already see how all kinds of weird corner cases start popping |
38 |
> up :P ) |
39 |
> |
40 |
> My current approach to logging is using the inotify [1] framework |
41 |
> present in Linux kernel since 2.6.13 (sorry BSD users, but this is |
42 |
> Gentoo Linux afterall) with the help of pyinotify [2]. |
43 |
> So far there's only one drawback to using inotify - by default kernel |
44 |
> has a limit of 8192 directory watches allowed per-process (but portage |
45 |
> contains a lots of directories) so in order to use that approach one has |
46 |
> to bump the number watches using /proc/sys/fs/inotify/max_user_watches |
47 |
> tunable. 81920 has worked so far fine on my machine ;) |
48 |
> |
49 |
> There was also a secondary approach suggested by my mentor Leio to parse |
50 |
> rsync log files, but I am a bit relucant about this idea. |
51 |
> |
52 |
> Anyway, I'll leave this idea simmering here for a while and unless |
53 |
> someone comes up with a better idea (Yes, I have also thought about |
54 |
> scanning whole portage tree every x-hours), I'm going to implement the |
55 |
> daemon. |
56 |
> |
57 |
> Plans for current week |
58 |
> ====================== |
59 |
> |
60 |
> As I currently consider the core issue solved, the next issue I have to |
61 |
> solve is how to take an ebuild, extract information about it and store |
62 |
> it in database. (Hint: pkgcore) |
63 |
> |
64 |
> I'm not going take bigger tasks because I still have one quite hard exam |
65 |
> (thermodynamics and statistical physics) on 4th of June. And if I pass, |
66 |
> it is the last one. |
67 |
> |
68 |
> PS. Sorry, no blog yet. I was using Zine, but it broke after I updated |
69 |
> my system to SQLAlchemy-0.6. |
70 |
> |
71 |
> [1] http://en.wikipedia.org/wiki/Inotify |
72 |
> [2] http://trac.dbzteam.org/pyinotify |
73 |
> |
74 |
> Päikest, |
75 |
> Priit Laes :) |
76 |
> |