>> > The BIG hitch in portage is the database strategy....it's file
>> > system based. Basicly it's thousands of small text files... you
>> > want to update
>> > the database?.... open, read, close over and over again....
>> >
>> > It sucks.
> Yup. Overhead's too large in today's filesystems.
>
>> > Portage is crying for an sql database backend... mysql, sqllite,
>> > mmsql...
>> > anything would be nice.
>> >
>> > Tell us more about your bsd ports, it sounds interesting...
>>
>> I believe FreeBSD uses an INDEX file in /usr/ports/INDEX, and then
>> compiles that file into a berkeley DB file or something in
>> /usr/ports/INDEX.db.
>>
>> I'm not really very fond of FreeBSD ports these days, actually.
>> It has the feel of something hackish, in desperate need of a
>> good bottom-up redesign.
>>
>> I'm starting to think that Gentoo's Portage is superior in almost
>> all ways, but I do think this search speed thing needs to be
>> dealt with.
> Definately.
>
>> I'm using esearch now, which is nice, but rebuilding the database
>> is a royal pain in the rear, and the database isn't kept in sync
>> between emerge runs.
>>
>> If the esearch database could be updated without having to rebuild
>> the entire thing (or at least without having to look at the
>> filesystem to rebuild the entire thing) after every emerge
>> operation then I think we'd be doing well.
> Hardly. Have you ever done a qpkg -q? I can drink enough coffee to
> severely upset my nervous system in the time it takes to finish.
> For some reason, most of it is in a sed (that looks like it could
> be done in a tr, and also like it's handling an insane amount of
> data).
>
> I think ocaml is a good suggestion - pretty fast, harder to make
> mistakes. But as also mentioned, the real problem is that portage
> is very filesystem based right now, which makes updating with rsync
> simple and bandwidth-efficient. Replacing something like that would
> take a lot of work, so some database building (relational db makes
> a fair bit of sense for some of this, though dependencies on
> postgres or mysql isn't. something smaller that can be made self
> contained like sqlite) is essential, and updating those
> (dependencies updates, world dependencies changes)
> should then be really thought about, 'cos without proper
> incremental/hashed update stuff it might turn out to be as useful
> as esearch, and rather more annoying as it's rather crucial
> base-system software we're talking about. And anything else one can
> do smarter is good.
>
> I'm tempted to give ocaml and possibly even my random ideas on the
> matter a shot as a toy project. Ugh, too much to do, though.
The basic problem in searching is actually that it isn't implemented smartly
in current portage. I have working (emerge -s like) code that is blazingly
fast as it does not actually open all ebuilds. Doing description searching is
impossible to do fast without some kind of cache. I don't think creating a
reliable cache for that is going to be a priority, but it is certainly
possible ;-).
As for rsync, the amount of files is too big, and I would like to reduce that
amount, but I don't see databases being a good replacement. We need something
that works in such a way that even a corrupted tree gets into a good status
after updating.
Paul
--
Paul de Vrieze
Gentoo Developer
Mail: pauldv@g.o
Homepage: http://www.devrieze.net
--
gentoo-performance@g.o mailing list
|