Gentoo Archives: gentoo-portage-dev

From:	Brian Harring <ferringb@×××××.com>
To:	gentoo-portage-dev@l.g.o
Subject:	sql based cache [was Re: [gentoo-portage-dev] Few things, which imho would make portage better]
Date:	Wed, 15 Mar 2006 00:30:05
Message-Id:	`20060315002938.GD10744@nightcrawler.had1.or.comcast.net`
In Reply to:	Re: [gentoo-portage-dev] Few things, which imho would make portage better by tvali

1	On Tue, Mar 14, 2006 at 04:52:14PM +0200, tvali wrote:
2	> > You're talking about the cache, take a look at the cache subsystem and
3	> > write a mysql module for it. This will never become a default though (we
4	> > would get killed if portage starts to depend on mysql).
5	>
6	> I think that it should not become default as mysql module, but if it
7	> is working, it should become default as "portable" sql module.
8	>
9	> # emerge sqlite pysqlite
10	>
11	> I havent used sqlite, but it seems to be small and usable. I think
12	> that it should start with it.
13	>
14	> I think that portage should support sql by default, but of course it
15	> should not be default before it's clear that many people like it and
16	> use it. What is imho more important is how to make one usable
17	> interface, which would cover both fs and sql portage db's so that
18	> development didnt go into two products.
19
20	See the restrictions framework I've started-
21	http://gentooexperimental.org/~ferringb/blog/archives/2005-07.html#e2005-07-13T01_21_42.txt
22	http://gentooexperimental.org/~ferring/bzr/pkgcore/dev-notes/framework/restrictions
23
24	Short version is that converting to sql internally sucks badly since
25	you'll have to parse (ad hoc) sql statements for any file based
26	backend. Using sql directly in portage requires encapsulating the sql
27	code so that rdbms syntax differences (replace comes to mind) can be
28	worked around...
29
30	Re: rdbms being faster then an on disk file db... it's only faster in
31	certain cases.
32	Properly designed/coded backends, RDBMS is _only_ faster when it's
33	returning N records when comparing it to a local file db.
34
35	As to why adding rdbms into stable is a bad idea right now, the
36	problem is in querying; you _could_ add a sql backend (pretty easy,
37	2.1 ships with a sql_template and sqlite backend from my earlier
38	work), but it'll actually be slower. Portage does cache lookups
39	individually; want the data for all bsdiff versions? portage does
40	thus-
41
42	keys=[]
43	for x in portdb.cp_all("dev-util/bsdiff"):
44	keys.append(portdb.aux_get(x, ["DEPENDS"]))
45
46	Each lookup is a seperate call- there is no way to leverage rdbms
47	speed for N record return if the calling api is (effectively) single
48	row queries.
49
50	To fully leverage a rdbms backend, need to restructure portage calls
51	so that it's dealing in lists instead of individual elements- fex,
52	under the rewrite
53
54	repository.match(atom("dev-util/bsdiff"))
55
56	Via that (and the restriction framework it uses) the api calls are
57	designed so that rdbms can shine; instead of N calls, the
58	repository/cache backend can convert the restrictions into a sql
59	statement and run _one_ search.
60
61	Finally...rdbms still has problems. If the repository isn't 'frozen'
62	(eg, it can regen it's metadata, as all portage trees in stable
63	currently can) you cannot rely on the cache backend aside from doing
64	random access lookups in it.
65
66	Why?
67
68	Cache holds dev-util/bsdiff-4.2 and dev-util/bsdiff-4.3, but not
69	dev-util/bsdiff-4.4 . If you hand off to the cache backend, it'll
70	return just those two, when it should return all 3.
71
72	~harring

Report Message

Find on MARC Find on Google Groups