Gentoo Archives: gentoo-performance

From: Paul de Vrieze <pauldv@g.o>
To: gentoo-performance@l.g.o
Subject: Re: [gentoo-performance] Re: portage performance
Date: Mon, 26 Jul 2004 09:19:16
Message-Id: 1501.221.136.16.201.1090833508.squirrel@221.136.16.201
In Reply to: Re: [gentoo-performance] Re: portage performance by Bart Alewijnse
1 >> > The BIG hitch in portage is the database strategy....it's file
2 >> > system based. Basicly it's thousands of small text files... you
3 >> > want to update
4 >> > the database?.... open, read, close over and over again....
5 >> >
6 >> > It sucks.
7 > Yup. Overhead's too large in today's filesystems.
8 >
9 >> > Portage is crying for an sql database backend... mysql, sqllite,
10 >> > mmsql...
11 >> > anything would be nice.
12 >> >
13 >> > Tell us more about your bsd ports, it sounds interesting...
14 >>
15 >> I believe FreeBSD uses an INDEX file in /usr/ports/INDEX, and then
16 >> compiles that file into a berkeley DB file or something in
17 >> /usr/ports/INDEX.db.
18 >>
19 >> I'm not really very fond of FreeBSD ports these days, actually.
20 >> It has the feel of something hackish, in desperate need of a
21 >> good bottom-up redesign.
22 >>
23 >> I'm starting to think that Gentoo's Portage is superior in almost
24 >> all ways, but I do think this search speed thing needs to be
25 >> dealt with.
26 > Definately.
27 >
28 >> I'm using esearch now, which is nice, but rebuilding the database
29 >> is a royal pain in the rear, and the database isn't kept in sync
30 >> between emerge runs.
31 >>
32 >> If the esearch database could be updated without having to rebuild
33 >> the entire thing (or at least without having to look at the
34 >> filesystem to rebuild the entire thing) after every emerge
35 >> operation then I think we'd be doing well.
36 > Hardly. Have you ever done a qpkg -q? I can drink enough coffee to
37 > severely upset my nervous system in the time it takes to finish.
38 > For some reason, most of it is in a sed (that looks like it could
39 > be done in a tr, and also like it's handling an insane amount of
40 > data).
41 >
42 > I think ocaml is a good suggestion - pretty fast, harder to make
43 > mistakes. But as also mentioned, the real problem is that portage
44 > is very filesystem based right now, which makes updating with rsync
45 > simple and bandwidth-efficient. Replacing something like that would
46 > take a lot of work, so some database building (relational db makes
47 > a fair bit of sense for some of this, though dependencies on
48 > postgres or mysql isn't. something smaller that can be made self
49 > contained like sqlite) is essential, and updating those
50 > (dependencies updates, world dependencies changes)
51 > should then be really thought about, 'cos without proper
52 > incremental/hashed update stuff it might turn out to be as useful
53 > as esearch, and rather more annoying as it's rather crucial
54 > base-system software we're talking about. And anything else one can
55 > do smarter is good.
56 >
57 > I'm tempted to give ocaml and possibly even my random ideas on the
58 > matter a shot as a toy project. Ugh, too much to do, though.
59
60 The basic problem in searching is actually that it isn't implemented smartly
61 in current portage. I have working (emerge -s like) code that is blazingly
62 fast as it does not actually open all ebuilds. Doing description searching is
63 impossible to do fast without some kind of cache. I don't think creating a
64 reliable cache for that is going to be a priority, but it is certainly
65 possible ;-).
66
67 As for rsync, the amount of files is too big, and I would like to reduce that
68 amount, but I don't see databases being a good replacement. We need something
69 that works in such a way that even a corrupted tree gets into a good status
70 after updating.
71
72 Paul
73
74 --
75 Paul de Vrieze
76 Gentoo Developer
77 Mail: pauldv@g.o
78 Homepage: http://www.devrieze.net
79
80
81 --
82 gentoo-performance@g.o mailing list

Replies

Subject Author
Re: [gentoo-performance] Re: portage performance Colin Kingsley <ckingsley@×××××.com>
Re: [gentoo-performance] Re: portage performance Brian Harring <ferringb@g.o>
Re: [gentoo-performance] Re: portage performance Bart Alewijnse <scarfboy@×××××.com>