Gentoo Archives: gentoo-portage-dev

From: tvali <qtvali@×××××.com>
To: gentoo-portage-dev@l.g.o
Subject: Re: [gentoo-portage-dev] search functionality in emerge
Date: Sun, 23 Nov 2008 20:00:31
Message-Id: cea53e3c0811231200u5b91aa8dqee4435ccdb6b8663@mail.gmail.com
In Reply to: Re: [gentoo-portage-dev] search functionality in emerge by Emma Strubell
1 Yes if it would be a low-level implementation to portage, speeding up it's
2 native code for searching and using indexes, then it would make everything
3 faster, including emerge (because emerge does search first for package
4 relations). I have actually wanted to do it myself several years ago, so
5 reacting here to have my ideas discussed, too.
6
7 Douglas Anderson 16:46 reply is about locks and I think that it would need
8 to rethink portages locking methods - what, when and why it locks. This is
9 probably quite hard task by itself. Anyway, as portage actually lets user
10 make two emerges at the same time, locks might be OK, too.
11
12 I think that the best thing would be bottom-up refactoring - first to make a
13 list of lowest-level functions, which have to do with reading data from
14 portage tree or writing into it; then making indexer class, which will be
15 used by all of those low-level functions.
16
17 To have it OOP, it should be implemented in such way:
18
19 - Low-level portage tree handler does everything with portage tree, no
20 function in portage uses it directly.
21 - Tree handler has several needed and several optional methods - so that
22 implementing new handler would be easy, but creating things like native
23 regex search would be possible.
24 - One could implement a new tree handler with SQLite or other interface
25 instead of filesystem and do other tricks through this interface; for
26 example, boost it.
27
28 So, nice way to go would be:
29
30 1. Implementing portage tree handler and it's proxy, which uses current
31 portage tree in non-indexed way and simply gives methods for the same kind
32 of access, as currently implemented one.
33 2. Refactoring portage to rely only on portage tree handler and use
34 direct portage tree nowhere. To test if it is so, portage tree should be
35 moved to another directory, about which only this handler knows, and check
36 if portage works well. Indexing all places, where portage uses it's tree
37 handler (by std. comment, for example) and making clear, which methods would
38 contain all boostable code of it.
39 3. Implementing those methods in proxy, which could simulate fast regex
40 search and other stuff using simplest possible interface of portage tree
41 handler (smth. like four methods add, remove, get, list). Proxy should be
42 able to use handler's methods if they are implemented.
43 4. Refactoring portage to use advanced methods in proxy.
44 5. Now, having taken all code together into one place and looking this
45 nice and readable code, real optimizations could be discussed here, for
46 example. Ideally, i think, portage would have such tree handlers:
47 - Filesystem handler - fast searches over current portage tree
48 structure.
49 - SQL handler - rewrite of tree functions into SQL queries.
50 - Network-based handler - maybe sometimes it would nice to have
51 portage tree only in one machine of cluster, for example if I
52 want to have
53 100 really small computers with fast connection with mother-computer and
54 portage tree is too big to be wholly copied to all of them.
55 - Memory-buffered handler with daemon, which is actually proxy to some
56 other handler - daemon, which reads whole tree (from filesystem
57 or SQL) into
58 memory on boot or first use, creates really fast index (because
59 now it does
60 matter to have better indexing) and optionally deletes some [less needed]
61 parts of it's index from memory when it's becoming full and behaves as
62 really simple proxy if it stays full. This should be implemented after
63 critical parts of filesystem or SQL handler.
64
65 2008/11/23 Emma Strubell <emma.strubell@×××××.com>
66
67 > Wow, that's extremely helpful!! I happen to particularly enjoy tries, so
68 > the suffix trie sounds like a great idea. The trie class example is really
69 > helpful too, because this will be my first time programming in Python, and
70 > it's a bit easier to figure out what's going on syntax-wise in that simple
71 > trie class than in the middle of the portage source code!
72 >
73 > Seriously, thanks again :]
74 >
75 >
76 > On Sun, Nov 23, 2008 at 11:56 AM, Lucian Poston <lucianposton@×××××.com>wrote:
77 >
78 >> > Thanks for the replies! I know there are a couple programs out there
79 >> that
80 >> > basically already do what I'm looking to do... Unfortunately I wasn't
81 >> aware
82 >> > of these pre-existing utilities until after I submitted my project
83 >> proposal
84 >> > to my professor. So, I'm looking to implement a better search myself.
85 >> > Preferably by editing the existing portage code, not writing a separate
86 >> > program. So if anyone can offer any help regarding the actual
87 >> implementation
88 >> > of search in portage, I would greatly appreciate it!
89 >>
90 >> Most of the search implementation is in
91 >> /usr/lib/portage/pym/_emerge/__init__.py in class search. The class's
92 >> execute() method simply iterates over all packages (and descriptions
93 >> and package sets) and matches against the searchkey. You might need
94 >> to look into pym/portage/dbapi/porttree.py for portdbapi as well.
95 >>
96 >> If you intend to index and support fast regex lookup, then you need to
97 >> do some fancy indexing, which I'm not terribly familiar with. You
98 >> could follow in the steps of eix[1] or other indexed search utilities
99 >> and design some sort of index layout, which is easier than the
100 >> following thought. You might consider implementing a suffix trie or
101 >> similar that has sublinear regexp lookup and marshalling the structure
102 >> for the index. I couldn't find a python implementation for something
103 >> like this, but here is a general trie class[2] that you might start
104 >> with if you go that route. There is a perl module[3],
105 >> Tie::Hash::Regex, that does that, but properly implementing that in
106 >> python would be a chore. :)
107 >>
108 >> That project sounds interesting and fun. Good luck!
109 >>
110 >> Lucian Poston
111 >>
112 >> [1] https://projects.gentooexperimental.org/eix/wiki/IndexFileLayout
113 >> [2]
114 >> http://www.koders.com/python/fid7B6BC1651A9E8BBA547552FE3F039479A4DECC45.aspx
115 >> [3]
116 >> http://search.cpan.org/~davecross/Tie-Hash-Regex-1.02/lib/Tie/Hash/Regex.pm<http://search.cpan.org/%7Edavecross/Tie-Hash-Regex-1.02/lib/Tie/Hash/Regex.pm>
117 >>
118 >>
119 >
120
121
122 --
123 tvali
124
125 Kuskilt foorumist: http://www.cooltests.com - kui inglise keelt oskad.
126 Muide, üle 120 oled väga tark, üle 140 oled geenius, mingi 170 oled ju mingi
127 täica pea nagu prügikast...