1 |
Yes if it would be a low-level implementation to portage, speeding up it's |
2 |
native code for searching and using indexes, then it would make everything |
3 |
faster, including emerge (because emerge does search first for package |
4 |
relations). I have actually wanted to do it myself several years ago, so |
5 |
reacting here to have my ideas discussed, too. |
6 |
|
7 |
Douglas Anderson 16:46 reply is about locks and I think that it would need |
8 |
to rethink portages locking methods - what, when and why it locks. This is |
9 |
probably quite hard task by itself. Anyway, as portage actually lets user |
10 |
make two emerges at the same time, locks might be OK, too. |
11 |
|
12 |
I think that the best thing would be bottom-up refactoring - first to make a |
13 |
list of lowest-level functions, which have to do with reading data from |
14 |
portage tree or writing into it; then making indexer class, which will be |
15 |
used by all of those low-level functions. |
16 |
|
17 |
To have it OOP, it should be implemented in such way: |
18 |
|
19 |
- Low-level portage tree handler does everything with portage tree, no |
20 |
function in portage uses it directly. |
21 |
- Tree handler has several needed and several optional methods - so that |
22 |
implementing new handler would be easy, but creating things like native |
23 |
regex search would be possible. |
24 |
- One could implement a new tree handler with SQLite or other interface |
25 |
instead of filesystem and do other tricks through this interface; for |
26 |
example, boost it. |
27 |
|
28 |
So, nice way to go would be: |
29 |
|
30 |
1. Implementing portage tree handler and it's proxy, which uses current |
31 |
portage tree in non-indexed way and simply gives methods for the same kind |
32 |
of access, as currently implemented one. |
33 |
2. Refactoring portage to rely only on portage tree handler and use |
34 |
direct portage tree nowhere. To test if it is so, portage tree should be |
35 |
moved to another directory, about which only this handler knows, and check |
36 |
if portage works well. Indexing all places, where portage uses it's tree |
37 |
handler (by std. comment, for example) and making clear, which methods would |
38 |
contain all boostable code of it. |
39 |
3. Implementing those methods in proxy, which could simulate fast regex |
40 |
search and other stuff using simplest possible interface of portage tree |
41 |
handler (smth. like four methods add, remove, get, list). Proxy should be |
42 |
able to use handler's methods if they are implemented. |
43 |
4. Refactoring portage to use advanced methods in proxy. |
44 |
5. Now, having taken all code together into one place and looking this |
45 |
nice and readable code, real optimizations could be discussed here, for |
46 |
example. Ideally, i think, portage would have such tree handlers: |
47 |
- Filesystem handler - fast searches over current portage tree |
48 |
structure. |
49 |
- SQL handler - rewrite of tree functions into SQL queries. |
50 |
- Network-based handler - maybe sometimes it would nice to have |
51 |
portage tree only in one machine of cluster, for example if I |
52 |
want to have |
53 |
100 really small computers with fast connection with mother-computer and |
54 |
portage tree is too big to be wholly copied to all of them. |
55 |
- Memory-buffered handler with daemon, which is actually proxy to some |
56 |
other handler - daemon, which reads whole tree (from filesystem |
57 |
or SQL) into |
58 |
memory on boot or first use, creates really fast index (because |
59 |
now it does |
60 |
matter to have better indexing) and optionally deletes some [less needed] |
61 |
parts of it's index from memory when it's becoming full and behaves as |
62 |
really simple proxy if it stays full. This should be implemented after |
63 |
critical parts of filesystem or SQL handler. |
64 |
|
65 |
2008/11/23 Emma Strubell <emma.strubell@×××××.com> |
66 |
|
67 |
> Wow, that's extremely helpful!! I happen to particularly enjoy tries, so |
68 |
> the suffix trie sounds like a great idea. The trie class example is really |
69 |
> helpful too, because this will be my first time programming in Python, and |
70 |
> it's a bit easier to figure out what's going on syntax-wise in that simple |
71 |
> trie class than in the middle of the portage source code! |
72 |
> |
73 |
> Seriously, thanks again :] |
74 |
> |
75 |
> |
76 |
> On Sun, Nov 23, 2008 at 11:56 AM, Lucian Poston <lucianposton@×××××.com>wrote: |
77 |
> |
78 |
>> > Thanks for the replies! I know there are a couple programs out there |
79 |
>> that |
80 |
>> > basically already do what I'm looking to do... Unfortunately I wasn't |
81 |
>> aware |
82 |
>> > of these pre-existing utilities until after I submitted my project |
83 |
>> proposal |
84 |
>> > to my professor. So, I'm looking to implement a better search myself. |
85 |
>> > Preferably by editing the existing portage code, not writing a separate |
86 |
>> > program. So if anyone can offer any help regarding the actual |
87 |
>> implementation |
88 |
>> > of search in portage, I would greatly appreciate it! |
89 |
>> |
90 |
>> Most of the search implementation is in |
91 |
>> /usr/lib/portage/pym/_emerge/__init__.py in class search. The class's |
92 |
>> execute() method simply iterates over all packages (and descriptions |
93 |
>> and package sets) and matches against the searchkey. You might need |
94 |
>> to look into pym/portage/dbapi/porttree.py for portdbapi as well. |
95 |
>> |
96 |
>> If you intend to index and support fast regex lookup, then you need to |
97 |
>> do some fancy indexing, which I'm not terribly familiar with. You |
98 |
>> could follow in the steps of eix[1] or other indexed search utilities |
99 |
>> and design some sort of index layout, which is easier than the |
100 |
>> following thought. You might consider implementing a suffix trie or |
101 |
>> similar that has sublinear regexp lookup and marshalling the structure |
102 |
>> for the index. I couldn't find a python implementation for something |
103 |
>> like this, but here is a general trie class[2] that you might start |
104 |
>> with if you go that route. There is a perl module[3], |
105 |
>> Tie::Hash::Regex, that does that, but properly implementing that in |
106 |
>> python would be a chore. :) |
107 |
>> |
108 |
>> That project sounds interesting and fun. Good luck! |
109 |
>> |
110 |
>> Lucian Poston |
111 |
>> |
112 |
>> [1] https://projects.gentooexperimental.org/eix/wiki/IndexFileLayout |
113 |
>> [2] |
114 |
>> http://www.koders.com/python/fid7B6BC1651A9E8BBA547552FE3F039479A4DECC45.aspx |
115 |
>> [3] |
116 |
>> http://search.cpan.org/~davecross/Tie-Hash-Regex-1.02/lib/Tie/Hash/Regex.pm<http://search.cpan.org/%7Edavecross/Tie-Hash-Regex-1.02/lib/Tie/Hash/Regex.pm> |
117 |
>> |
118 |
>> |
119 |
> |
120 |
|
121 |
|
122 |
-- |
123 |
tvali |
124 |
|
125 |
Kuskilt foorumist: http://www.cooltests.com - kui inglise keelt oskad. |
126 |
Muide, üle 120 oled väga tark, üle 140 oled geenius, mingi 170 oled ju mingi |
127 |
täica pea nagu prügikast... |