1 |
>> > The BIG hitch in portage is the database strategy....it's file |
2 |
>> > system based. Basicly it's thousands of small text files... you |
3 |
>> > want to update |
4 |
>> > the database?.... open, read, close over and over again.... |
5 |
>> > |
6 |
>> > It sucks. |
7 |
> Yup. Overhead's too large in today's filesystems. |
8 |
> |
9 |
>> > Portage is crying for an sql database backend... mysql, sqllite, |
10 |
>> > mmsql... |
11 |
>> > anything would be nice. |
12 |
>> > |
13 |
>> > Tell us more about your bsd ports, it sounds interesting... |
14 |
>> |
15 |
>> I believe FreeBSD uses an INDEX file in /usr/ports/INDEX, and then |
16 |
>> compiles that file into a berkeley DB file or something in |
17 |
>> /usr/ports/INDEX.db. |
18 |
>> |
19 |
>> I'm not really very fond of FreeBSD ports these days, actually. |
20 |
>> It has the feel of something hackish, in desperate need of a |
21 |
>> good bottom-up redesign. |
22 |
>> |
23 |
>> I'm starting to think that Gentoo's Portage is superior in almost |
24 |
>> all ways, but I do think this search speed thing needs to be |
25 |
>> dealt with. |
26 |
> Definately. |
27 |
> |
28 |
>> I'm using esearch now, which is nice, but rebuilding the database |
29 |
>> is a royal pain in the rear, and the database isn't kept in sync |
30 |
>> between emerge runs. |
31 |
>> |
32 |
>> If the esearch database could be updated without having to rebuild |
33 |
>> the entire thing (or at least without having to look at the |
34 |
>> filesystem to rebuild the entire thing) after every emerge |
35 |
>> operation then I think we'd be doing well. |
36 |
> Hardly. Have you ever done a qpkg -q? I can drink enough coffee to |
37 |
> severely upset my nervous system in the time it takes to finish. |
38 |
> For some reason, most of it is in a sed (that looks like it could |
39 |
> be done in a tr, and also like it's handling an insane amount of |
40 |
> data). |
41 |
> |
42 |
> I think ocaml is a good suggestion - pretty fast, harder to make |
43 |
> mistakes. But as also mentioned, the real problem is that portage |
44 |
> is very filesystem based right now, which makes updating with rsync |
45 |
> simple and bandwidth-efficient. Replacing something like that would |
46 |
> take a lot of work, so some database building (relational db makes |
47 |
> a fair bit of sense for some of this, though dependencies on |
48 |
> postgres or mysql isn't. something smaller that can be made self |
49 |
> contained like sqlite) is essential, and updating those |
50 |
> (dependencies updates, world dependencies changes) |
51 |
> should then be really thought about, 'cos without proper |
52 |
> incremental/hashed update stuff it might turn out to be as useful |
53 |
> as esearch, and rather more annoying as it's rather crucial |
54 |
> base-system software we're talking about. And anything else one can |
55 |
> do smarter is good. |
56 |
> |
57 |
> I'm tempted to give ocaml and possibly even my random ideas on the |
58 |
> matter a shot as a toy project. Ugh, too much to do, though. |
59 |
|
60 |
The basic problem in searching is actually that it isn't implemented smartly |
61 |
in current portage. I have working (emerge -s like) code that is blazingly |
62 |
fast as it does not actually open all ebuilds. Doing description searching is |
63 |
impossible to do fast without some kind of cache. I don't think creating a |
64 |
reliable cache for that is going to be a priority, but it is certainly |
65 |
possible ;-). |
66 |
|
67 |
As for rsync, the amount of files is too big, and I would like to reduce that |
68 |
amount, but I don't see databases being a good replacement. We need something |
69 |
that works in such a way that even a corrupted tree gets into a good status |
70 |
after updating. |
71 |
|
72 |
Paul |
73 |
|
74 |
-- |
75 |
Paul de Vrieze |
76 |
Gentoo Developer |
77 |
Mail: pauldv@g.o |
78 |
Homepage: http://www.devrieze.net |
79 |
|
80 |
|
81 |
-- |
82 |
gentoo-performance@g.o mailing list |