1 |
On Tue, Mar 14, 2006 at 04:52:14PM +0200, tvali wrote: |
2 |
> > You're talking about the cache, take a look at the cache subsystem and |
3 |
> > write a mysql module for it. This will never become a default though (we |
4 |
> > would get killed if portage starts to depend on mysql). |
5 |
> |
6 |
> I think that it should not become default as mysql module, but if it |
7 |
> is working, it should become default as "portable" sql module. |
8 |
> |
9 |
> # emerge sqlite pysqlite |
10 |
> |
11 |
> I havent used sqlite, but it seems to be small and usable. I think |
12 |
> that it should start with it. |
13 |
> |
14 |
> I think that portage should *support* sql by default, but of course it |
15 |
> should not be default before it's clear that many people like it and |
16 |
> use it. What is imho more important is how to make one usable |
17 |
> interface, which would cover both fs and sql portage db's so that |
18 |
> development didnt go into two products. |
19 |
|
20 |
See the restrictions framework I've started- |
21 |
http://gentooexperimental.org/~ferringb/blog/archives/2005-07.html#e2005-07-13T01_21_42.txt |
22 |
http://gentooexperimental.org/~ferring/bzr/pkgcore/dev-notes/framework/restrictions |
23 |
|
24 |
Short version is that converting to sql internally sucks badly since |
25 |
you'll have to parse (ad hoc) sql statements for any file based |
26 |
backend. Using sql directly in portage requires encapsulating the sql |
27 |
code so that rdbms syntax differences (replace comes to mind) can be |
28 |
worked around... |
29 |
|
30 |
Re: rdbms being faster then an on disk file db... it's only faster in |
31 |
certain cases. |
32 |
Properly designed/coded backends, RDBMS is _only_ faster when it's |
33 |
returning N records when comparing it to a local file db. |
34 |
|
35 |
As to why adding rdbms into stable is a bad idea right now, the |
36 |
problem is in querying; you _could_ add a sql backend (pretty easy, |
37 |
2.1 ships with a sql_template and sqlite backend from my earlier |
38 |
work), but it'll actually be slower. Portage does cache lookups |
39 |
individually; want the data for all bsdiff versions? portage does |
40 |
thus- |
41 |
|
42 |
keys=[] |
43 |
for x in portdb.cp_all("dev-util/bsdiff"): |
44 |
keys.append(portdb.aux_get(x, ["DEPENDS"])) |
45 |
|
46 |
Each lookup is a seperate call- there is no way to leverage rdbms |
47 |
speed for N record return if the calling api is (effectively) single |
48 |
row queries. |
49 |
|
50 |
To fully leverage a rdbms backend, need to restructure portage calls |
51 |
so that it's dealing in lists instead of individual elements- fex, |
52 |
under the rewrite |
53 |
|
54 |
repository.match(atom("dev-util/bsdiff")) |
55 |
|
56 |
Via that (and the restriction framework it uses) the api calls are |
57 |
designed so that rdbms can shine; instead of N calls, the |
58 |
repository/cache backend can convert the restrictions into a sql |
59 |
statement and run _one_ search. |
60 |
|
61 |
Finally...rdbms still has problems. If the repository isn't 'frozen' |
62 |
(eg, it can regen it's metadata, as all portage trees in stable |
63 |
currently can) you cannot rely on the cache backend aside from doing |
64 |
random access lookups in it. |
65 |
|
66 |
Why? |
67 |
|
68 |
Cache holds dev-util/bsdiff-4.2 and dev-util/bsdiff-4.3, but not |
69 |
dev-util/bsdiff-4.4 . If you hand off to the cache backend, it'll |
70 |
return just those two, when it should return all 3. |
71 |
|
72 |
~harring |