Gentoo Archives: gentoo-portage-dev

From: Brian Harring <ferringb@×××××.com>
To: gentoo-portage-dev@l.g.o
Subject: sql based cache [was Re: [gentoo-portage-dev] Few things, which imho would make portage better]
Date: Wed, 15 Mar 2006 00:30:05
Message-Id: 20060315002938.GD10744@nightcrawler.had1.or.comcast.net
In Reply to: Re: [gentoo-portage-dev] Few things, which imho would make portage better by tvali
1 On Tue, Mar 14, 2006 at 04:52:14PM +0200, tvali wrote:
2 > > You're talking about the cache, take a look at the cache subsystem and
3 > > write a mysql module for it. This will never become a default though (we
4 > > would get killed if portage starts to depend on mysql).
5 >
6 > I think that it should not become default as mysql module, but if it
7 > is working, it should become default as "portable" sql module.
8 >
9 > # emerge sqlite pysqlite
10 >
11 > I havent used sqlite, but it seems to be small and usable. I think
12 > that it should start with it.
13 >
14 > I think that portage should *support* sql by default, but of course it
15 > should not be default before it's clear that many people like it and
16 > use it. What is imho more important is how to make one usable
17 > interface, which would cover both fs and sql portage db's so that
18 > development didnt go into two products.
19
20 See the restrictions framework I've started-
21 http://gentooexperimental.org/~ferringb/blog/archives/2005-07.html#e2005-07-13T01_21_42.txt
22 http://gentooexperimental.org/~ferring/bzr/pkgcore/dev-notes/framework/restrictions
23
24 Short version is that converting to sql internally sucks badly since
25 you'll have to parse (ad hoc) sql statements for any file based
26 backend. Using sql directly in portage requires encapsulating the sql
27 code so that rdbms syntax differences (replace comes to mind) can be
28 worked around...
29
30 Re: rdbms being faster then an on disk file db... it's only faster in
31 certain cases.
32 Properly designed/coded backends, RDBMS is _only_ faster when it's
33 returning N records when comparing it to a local file db.
34
35 As to why adding rdbms into stable is a bad idea right now, the
36 problem is in querying; you _could_ add a sql backend (pretty easy,
37 2.1 ships with a sql_template and sqlite backend from my earlier
38 work), but it'll actually be slower. Portage does cache lookups
39 individually; want the data for all bsdiff versions? portage does
40 thus-
41
42 keys=[]
43 for x in portdb.cp_all("dev-util/bsdiff"):
44 keys.append(portdb.aux_get(x, ["DEPENDS"]))
45
46 Each lookup is a seperate call- there is no way to leverage rdbms
47 speed for N record return if the calling api is (effectively) single
48 row queries.
49
50 To fully leverage a rdbms backend, need to restructure portage calls
51 so that it's dealing in lists instead of individual elements- fex,
52 under the rewrite
53
54 repository.match(atom("dev-util/bsdiff"))
55
56 Via that (and the restriction framework it uses) the api calls are
57 designed so that rdbms can shine; instead of N calls, the
58 repository/cache backend can convert the restrictions into a sql
59 statement and run _one_ search.
60
61 Finally...rdbms still has problems. If the repository isn't 'frozen'
62 (eg, it can regen it's metadata, as all portage trees in stable
63 currently can) you cannot rely on the cache backend aside from doing
64 random access lookups in it.
65
66 Why?
67
68 Cache holds dev-util/bsdiff-4.2 and dev-util/bsdiff-4.3, but not
69 dev-util/bsdiff-4.4 . If you hand off to the cache backend, it'll
70 return just those two, when it should return all 3.
71
72 ~harring