Gentoo Archives: gentoo-performance

From: Paul de Vrieze <pauldv@g.o>
To: gentoo-performance@l.g.o
Subject: Re: [gentoo-performance] Re: portage performance
Date: Tue, 03 Aug 2004 08:49:30
Message-Id: 200408031049.26281.pauldv@gentoo.org
In Reply to: Re: [gentoo-performance] Re: portage performance by Brian Harring
1 On Tuesday 27 July 2004 00:57, Brian Harring wrote:
2 > > The basic problem in searching is actually that it isn't implemented
3 > > smartly
4 > > in current portage. I have working (emerge -s like) code that is
5 > > blazingly
6 > > fast as it does not actually open all ebuilds.
7 >
8 > Searching works off of the cache for the most part, if a cache entry is
9 > stale, it's updated (eg the ebuild is opened and srced).
10 > Unless you're not checking the cache and updating it as you proceed,
11 > you're implementation ought to suffer the same limitation.
12
13 Basically it does a directory glob selecting valid candidates. Those
14 candidates are then checked whether they are real packages. If they are, they
15 are valid results and returned.
16
17 > There are 2 things that need to be done (in my books at least) to step
18 > up the speed of a description search-
19 > A) sql based cache backend, whether sqlite or mysql. Either that, or
20 > extend the flat cache to store the descriptions in a central index.
21 > B) alter the search description alg so that instead of stepping through
22 > each entry getting the description, we just state "give me all packages
23 > that have a description matching blar", and leave it up to the backend
24 > to decide what is the most efficient way to search. With flat cache,
25 > we'd still have to go file by file; w/ a sql variant, it could take
26 > advantage of the appropriate syntax.
27
28 Probably some kind of caching or tool (like makewhatis) is the way to go. An
29 option would be to use grep first to limit the amount of candidate packages
30 that get examined for real (grep is a lot cheaper than parsing).
31
32 > Since there is code for a sql based cache backend, B has been bounced
33 > around in #gentoo-portage a bit. Prior to it actually happening I
34 > would think the sql db code would need to be cleaned up/QA'd/etc.
35 >
36 > Course, there still is the issue of verifying that the cache entry
37 > isn't stale... :)
38
39 For now on I don't have any persistent caching in my working code (except
40 where it uses old code for accessing current ebuilds) to keep it simple. It
41 actually allready is quite fast.
42
43 > Err, eh? If the tree is corrupted, and sync'd against a
44 > good/non-corrupted tree, it ought to be reverted to a sane state.
45
46 Exactly
47
48 Paul
49
50 --
51 Paul de Vrieze
52 Gentoo Developer
53 Mail: pauldv@g.o
54 Homepage: http://www.devrieze.net