Gentoo Archives: gentoo-performance

From: Paul de Vrieze <pauldv@g.o>
To: gentoo-performance@l.g.o
Subject: Re: [gentoo-performance] Re: portage performance
Date: Mon, 26 Jul 2004 09:19:16
Message-Id: 1501.
In Reply to: Re: [gentoo-performance] Re: portage performance by Bart Alewijnse
>> > The BIG hitch in portage is the database's file >> > system based. Basicly it's thousands of small text files... you >> > want to update >> > the database?.... open, read, close over and over again.... >> > >> > It sucks. > Yup. Overhead's too large in today's filesystems. > >> > Portage is crying for an sql database backend... mysql, sqllite, >> > mmsql... >> > anything would be nice. >> > >> > Tell us more about your bsd ports, it sounds interesting... >> >> I believe FreeBSD uses an INDEX file in /usr/ports/INDEX, and then >> compiles that file into a berkeley DB file or something in >> /usr/ports/INDEX.db. >> >> I'm not really very fond of FreeBSD ports these days, actually. >> It has the feel of something hackish, in desperate need of a >> good bottom-up redesign. >> >> I'm starting to think that Gentoo's Portage is superior in almost >> all ways, but I do think this search speed thing needs to be >> dealt with. > Definately. > >> I'm using esearch now, which is nice, but rebuilding the database >> is a royal pain in the rear, and the database isn't kept in sync >> between emerge runs. >> >> If the esearch database could be updated without having to rebuild >> the entire thing (or at least without having to look at the >> filesystem to rebuild the entire thing) after every emerge >> operation then I think we'd be doing well. > Hardly. Have you ever done a qpkg -q? I can drink enough coffee to > severely upset my nervous system in the time it takes to finish. > For some reason, most of it is in a sed (that looks like it could > be done in a tr, and also like it's handling an insane amount of > data). > > I think ocaml is a good suggestion - pretty fast, harder to make > mistakes. But as also mentioned, the real problem is that portage > is very filesystem based right now, which makes updating with rsync > simple and bandwidth-efficient. Replacing something like that would > take a lot of work, so some database building (relational db makes > a fair bit of sense for some of this, though dependencies on > postgres or mysql isn't. something smaller that can be made self > contained like sqlite) is essential, and updating those > (dependencies updates, world dependencies changes) > should then be really thought about, 'cos without proper > incremental/hashed update stuff it might turn out to be as useful > as esearch, and rather more annoying as it's rather crucial > base-system software we're talking about. And anything else one can > do smarter is good. > > I'm tempted to give ocaml and possibly even my random ideas on the > matter a shot as a toy project. Ugh, too much to do, though.
The basic problem in searching is actually that it isn't implemented smartly in current portage. I have working (emerge -s like) code that is blazingly fast as it does not actually open all ebuilds. Doing description searching is impossible to do fast without some kind of cache. I don't think creating a reliable cache for that is going to be a priority, but it is certainly possible ;-). As for rsync, the amount of files is too big, and I would like to reduce that amount, but I don't see databases being a good replacement. We need something that works in such a way that even a corrupted tree gets into a good status after updating. Paul -- Paul de Vrieze Gentoo Developer Mail: pauldv@g.o Homepage: -- gentoo-performance@g.o mailing list


Subject Author
Re: [gentoo-performance] Re: portage performance Colin Kingsley <ckingsley@×××××.com>
Re: [gentoo-performance] Re: portage performance Brian Harring <ferringb@g.o>
Re: [gentoo-performance] Re: portage performance Bart Alewijnse <scarfboy@×××××.com>