Gentoo Archives: gentoo-portage-dev

From: Emma Strubell <emma.strubell@×××××.com>
To: gentoo-portage-dev@l.g.o
Subject: Re: [gentoo-portage-dev] Re: search functionality in emerge
Date: Tue, 02 Dec 2008 02:23:08
Message-Id: 5a8c638a0812011823x3fc3c3eesc0aa73566d6bc838@mail.gmail.com
In Reply to: Re: [gentoo-portage-dev] Re: search functionality in emerge by Tambet
1 yes, yes, i know, you're right :]
2
3 and thanks a bunch for the outline! about the compression, I agree that it
4 would be a good idea, but I don't know how to implement it. not that it
5 would be difficult... I'm guessing there's a gzip module for python that
6 would make it pretty straightforward? I think I'm getting ahead of myself,
7 though. I haven't even implemented the suffix tree yet!
8
9 Emma
10
11 On Mon, Dec 1, 2008 at 7:20 PM, Tambet <qtvali@×××××.com> wrote:
12
13 > 2008/12/2 Emma Strubell <emma.strubell@×××××.com>
14 >
15 >> True, true. Like I said, I don't really use overlays, so excuse my
16 >> igonrance.
17 >>
18 >
19 > Do you know an order of doing things:
20 >
21 > Rules of Optimization:
22 >
23 > - Rule 1: Don't do it.
24 > - Rule 2 (for experts only): Don't do it yet.
25 >
26 > What this actually means - functionality comes first. Readability comes
27 > next. Optimization comes last. Unless you are creating a fancy 3D engine for
28 > kung fu game.
29 >
30 > If you are going to exclude overlays, you are removing functionality - and,
31 > indeed, absolutely has-to-be-there functionality, because noone would
32 > intuitively expect search function to search only one subset of packages,
33 > however reasonable this subset would be. So, you can't, just can't, add this
34 > package into portage base - you could write just another external search
35 > package for portage.
36 >
37 > I looked this code a bit and:
38 > Portage's "__init__.py" contains comment "# search functionality". After
39 > this comment, there is a nice and simple search class.
40 > It also contains method "def action_sync(...)", which contains
41 > synchronization stuff.
42 >
43 > Now, search class will be initialized by setting up 3 databases - porttree,
44 > bintree and vartree, whatever those are. Those will be in self._dbs array
45 > and porttree will be in self._portdb.
46 >
47 > It contains some more methods:
48 > _findname(...) will return result of self._portdb.findname(...) with same
49 > parameters or None if it does not exist.
50 > Other methods will do similar things - map one or another method.
51 > execute will do the real search...
52 > Now - "for package in self.portdb.cp_all()" is important here ...it
53 > currently loops over whole portage tree. All kinds of matching will be done
54 > inside.
55 > self.portdb obviously points to porttree.py (unless it points to fake
56 > tree).
57 > cp_all will take all porttrees and do simple file search inside. This
58 > method should contain optional index search.
59 >
60 > self.porttrees = [self.porttree_root] + \
61 > [os.path.realpath(t) for t in self.mysettings["PORTDIR_OVERLAY"].split()]
62 >
63 > So, self.porttrees contains list of trees - first of them is root, others
64 > are overlays.
65 >
66 > Now, what you have to do will not be harder just because of having overlay
67 > search, too.
68 >
69 > You have to create method def cp_index(self), which will return dictionary
70 > containing package names as keys. For oroot... will be "self.porttrees[1:]",
71 > not "self.porttrees" - this will only search overlays. d = {} will be
72 > replaced with d = self.cp_index(). If index is not there, old version will
73 > be used (thus, you have to make internal porttrees variable, which contains
74 > all or all except first).
75 >
76 > Other methods used by search are xmatch and aux_get - first used several
77 > times and last one used to get description. You have to cache results of
78 > those specific queries and make them use your cache - as you can see, those
79 > parts of portage are already able to use overlays. Thus, you have to put
80 > your code again in beginning of those functions - create index_xmatch and
81 > index_aux_get methods, then make those methods use them and return their
82 > results unless those are None (or something other in case none is already
83 > legal result) - if they return None, old code will be run and do it's job.
84 > If index is not created, result is None. In index_** methods, just check if
85 > query is what you can answer and if it is, then answer it.
86 >
87 > Obviously, the simplest way to create your index is to delete index, then
88 > use those same methods to query for all nessecary information - and fastest
89 > way would be to add updating index directly into sync, which you could do
90 > later.
91 >
92 > Please, also, make those commands to turn index on and off (last one should
93 > also delete it to save disk space). Default should be off until it's fast,
94 > small and reliable. Also notice that if index is kept on hard drive, it
95 > might be faster if it's compressed (gz, for example) - decompressing takes
96 > less time and more processing power than reading it fully out.
97 >
98 > Have luck!
99 >
100 > -----BEGIN PGP SIGNED MESSAGE-----
101 >>> Hash: SHA1
102 >>>
103 >>> Emma Strubell schrieb:
104 >>> > 2) does anyone really need to search an overlay anyway?
105 >>>
106 >>> Of course. Take large (semi-)official overlays like sunrise. They can
107 >>> easily be seen as a second portage tree.
108 >>> -----BEGIN PGP SIGNATURE-----
109 >>> Version: GnuPG v2.0.9 (GNU/Linux)
110 >>> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
111 >>>
112 >>> iEYEARECAAYFAkk0YpEACgkQ4UOg/zhYFuD3jQCdG/ChDmyOncpgUKeMuqDxD1Tt
113 >>> 0mwAn2FXskdEAyFlmE8shUJy7WlhHr4S
114 >>> =+lCO
115 >>> -----END PGP SIGNATURE-----
116 >>>
117 >>> On Mon, Dec 1, 2008 at 5:17 PM, René 'Necoro' Neumann <lists@××××××.eu>wrote:
118 >>
119 >>
120 >