1 |
> not relying on custom system daemonsrunning in the background. |
2 |
|
3 |
Why is a portage daemon such a bad thing? Or hard to do? I would very much like a daemon running on my system which I can configure to sync the portage tree once a week (or month if I am lazy), give me a summary of hot fixes, security fixes in a nice email, push important announcements and of course, sync caches on detecting changes (which should be trivial with notify daemons all over the place) etc. Why is it such a bad thing? |
4 |
|
5 |
Its crazy to think that security updates need to be pulled in Linux. |
6 |
|
7 |
-devsk |
8 |
|
9 |
|
10 |
|
11 |
----- Original Message ---- |
12 |
From: Marius Mauch <genone@g.o> |
13 |
To: gentoo-portage-dev@l.g.o |
14 |
Sent: Sunday, November 23, 2008 7:12:57 PM |
15 |
Subject: Re: [gentoo-portage-dev] search functionality in emerge |
16 |
|
17 |
On Sun, 23 Nov 2008 07:17:40 -0500 |
18 |
"Emma Strubell" <emma.strubell@×××××.com> wrote: |
19 |
|
20 |
> However, I've started looking at the code, and I must admit I'm pretty |
21 |
> overwhelmed! I don't know where to start. I was wondering if anyone |
22 |
> on here could give me a quick overview of how the search function |
23 |
> currently works, an idea as to what could be modified or implemented |
24 |
> in order to improve the running time of this code, or any tip really |
25 |
> as to where I should start or what I should start looking at. I'd |
26 |
> really appreciate any help or advice!! |
27 |
|
28 |
Well, it depends how much effort you want to put into this. The current |
29 |
interface doesn't actually provide a "search" interface, but merely |
30 |
functions to |
31 |
1) list all package names - dbapi.cp_all() |
32 |
2) list all package names and versions - dbapi.cpv_all() |
33 |
3) list all versions for a given package name - dbapi.cp_list() |
34 |
4) read metadata (like DESCRIPTION) for a given package name and |
35 |
version - dbapi.aux_get() |
36 |
|
37 |
One of the main performance problems of --search is that there is no |
38 |
persistent cache for functions 1, 2 and 3, so if you're "just" |
39 |
interested in performance aspects you might want to look into that. |
40 |
The issue with implementing a persistent cache is that you have to |
41 |
consider both cold and hot filesystem cache cases: Loading an index |
42 |
file with package names and versions might improve the cold-cache case, |
43 |
but slow things down when the filesystem cache is populated. |
44 |
As has been mentioned, keeping the index updated is the other major |
45 |
issue, especially as it has to be portable and should require little or |
46 |
no configuration/setup for the user (so no extra daemons or special |
47 |
filesystems running permanently in the background). The obvious |
48 |
solution would be to generate the cache after `emerge --sync` (and other |
49 |
sync implementations) and hope that people don't modify their tree and |
50 |
search for the changes in between (that's what all the external tools |
51 |
do). I don't know if there is actually a way to do online updates while |
52 |
still improving performance and not relying on custom system daemons |
53 |
running in the background. |
54 |
|
55 |
As for --searchdesc, one problem is that dbapi.aux_get() can only |
56 |
operate on a single package-version on each call (though it can read |
57 |
multiple metadata variables). So for description searches the control |
58 |
flow is like this (obviously simplified): |
59 |
|
60 |
result = [] |
61 |
# iterate over all packages |
62 |
for package in dbapi.cp_all(): |
63 |
# determine the current version of each package, this is |
64 |
# another performance issue. |
65 |
version = get_current_version(package) |
66 |
# read package description from metadata cache |
67 |
description = dbapi.aux_get(version, ["DESCRIPTION"])[0] |
68 |
# check if the description matches |
69 |
if matches(description, searchkey): |
70 |
result.append(package) |
71 |
|
72 |
There you see the three bottlenecks: the lack of a pregenerated package |
73 |
list, the version lookup for *each* package and the actual metadata |
74 |
read. I've already talked about the first, so lets look at the other |
75 |
two. The core problem there is that DESCRIPTION (like all standard |
76 |
metadata variables) is version specific, so to access it you need to |
77 |
determine a version to use, even though in almost all cases the |
78 |
description is the same (or very similar) for all versions. So the |
79 |
proper solution would be to make the description a property of the |
80 |
package name instead of the package version, but that's a _huge_ task |
81 |
you're probably not interested in. What _might_ work here is to add |
82 |
support for an optional package-name->description cache that can be |
83 |
generated offline and includes those packages where all versions have |
84 |
the same description, and fall back to the current method if the |
85 |
package is not included in the cache. (Don't think about caching the |
86 |
version lookup, that's system dependent and therefore not suitable for |
87 |
caching). |
88 |
|
89 |
Hope it has become clear that while the actual search algorithm might |
90 |
be simple and not very efficient, the real problem lies in getting the |
91 |
data to operate on. |
92 |
|
93 |
That and the somewhat limited dbapi interface. |
94 |
|
95 |
Disclaimer: The stuff below involves extending and redesigning some |
96 |
core portage APIs. This isn't something you can do on a weekend, only |
97 |
work on this if you want to commit yourself to portage development |
98 |
for a long time. |
99 |
|
100 |
The functions listed above are the bare minimum to |
101 |
perform queries on the package repositories, but they're very |
102 |
low-level. That means that whenever you want to select packages by |
103 |
name, description, license, dependencies or other variables you need |
104 |
quite a bit of custom code, more if you want to combine multiple |
105 |
searches, and much more if you want to do it efficient and flexible. |
106 |
See http://dev.gentoo.org/~genone/scripts/metalib.py and |
107 |
http://dev.gentoo.org/~genone/scripts/metascan for a somewhat flexible, |
108 |
but very inefficient search tool (might not work anymore due to old |
109 |
age). |
110 |
|
111 |
Ideally repository searches could be done without writing any |
112 |
application code using some kind of query language, similar to how SQL |
113 |
works for generic database searches (obviously not that complex). |
114 |
But before thinking about that we'd need a query API that actually |
115 |
a) allows tools to assemble queries without having to worry about |
116 |
implementation details |
117 |
b) run them efficiently without bothering the API user |
118 |
|
119 |
Simple example: Find all package-versions in the sys-apps category that |
120 |
are BSD-licensed. |
121 |
|
122 |
Currently that would involve something like: |
123 |
|
124 |
result = [] |
125 |
for package is dbapi.cp_all(): |
126 |
if not package.startswith("sys-apps/"): |
127 |
continue |
128 |
for version in dbapi.cp_list(package): |
129 |
license = dbapi.aux_get(version, ["LICENSE"])[0] |
130 |
# for simplicity perform a equivalence check, in reality you'd |
131 |
# have to account for complex license definitions |
132 |
if license == "BSD": |
133 |
result.append(version) |
134 |
|
135 |
Not very friendly to maintain, and not very efficient (we'd only need |
136 |
to iterate over packages in the 'sys-apps' category, but the interface |
137 |
doesn't allow that). |
138 |
And now how it might look with a extensive query interface: |
139 |
|
140 |
query = AndQuery() |
141 |
query.add(CategoryQuery("sys-apps", FullStringMatch())) |
142 |
query.add(MetadataQuery("BSD", FullStringMatch())) |
143 |
result = repository.selectPackages(query) |
144 |
|
145 |
Much nicer, don't you think? |
146 |
|
147 |
As said, implementing such a thing would be a huge amount of work, even |
148 |
if just implemented as wrappers on top of the current interface (which |
149 |
would prevent many efficiency improvements), but if you (or anyone else |
150 |
for that matter) are truly interested in this contact me off-list, |
151 |
maybe I can find some of my old design ideas and (incomplete) |
152 |
prototypes to give you a start. |
153 |
|
154 |
Marius |