Gentoo Archives: gentoo-dev

From: Brian Harring <ferringb@g.o>
To: gentoo-dev@l.g.o
Subject: Re: [gentoo-dev] [RFC] Treewide metadata.xml
Date: Fri, 27 May 2005 12:20:48
Message-Id: 20050527122138.GA1837@exodus.wit.org
In Reply to: Re: [gentoo-dev] [RFC] Treewide metadata.xml by Danny van Dyk
1 On Fri, May 27, 2005 at 01:47:37PM +0200, Danny van Dyk wrote:
2 > Hi Brian
3 > > What's the gain, aside from implication of collapsing it into a
4 > > single file? Honestly my only use for metadata.xml is looking up who
5 > > I get to poke about fixing broken ebuilds...
6 > The gain is:
7 > ... that you portage people could use it for emerge -s instead of using
8 > a DESCRIPTION-cache.
9
10 'you portage people' ? :)
11
12 > ... we don't need to find the metadata.xml file before parsing it.
13
14 Portage's emerge -s doesn't use metadata.xml. Guessing you meant
15 emerge -S (--searchDesc), but that too, doesn't use metadata.xml.
16
17 So, a few implications in what you mean/are after then.
18 1) This global description cache would have to be duplicated, and
19 recreated on cvs->rsync runs. Why? Unless you're padding extra bytes
20 in the description cache, updates _will_ kill performance.
21 Personally, I'm not much for it because there is a minimal window for
22 cvs->rsync infra-side to get it's thing done, and this will jack up
23 the runtime.
24
25 2) You're still doing entry by entry. Y'all are assuming having this
26 data shoved into one file is going to make it quicker for reads (in
27 reality, you're still reading 19000+ records, just your solution is
28 out of a single file). This may be quicker due to syscall overhead,
29 but I posit the drawbacks aren't worth it.
30
31 3) This complicates the hell out of cache updates, and still suffers
32 the same issues eix/esearch suffer- namely that it's not sensitive to
33 cache updates. If we make it sensitive to cache updates, you're
34 looking at regen runtimes going through the roof (see #1 comment on
35 updates). This is regardless of if it's a duplication approach or
36 description is stored in it's own db outside of the normal flat_list
37 cache files.
38
39 4) This proposal breaks the cache up into seperate chunks. That's
40 the cache backends decision frankly, and _cannot_ be imposed onto the
41 cache backend implementation from above.
42
43 I moved eclass data into the cache backend in cvs head explicitly
44 for the purpose of allowing the cache to be effectively standalone,
45 and able to be bound to a remote tree. You force this change from
46 above, it breaks the cache design (pure and simple), and ultimately
47 isn't what you're after (see below).
48
49
50 Frankly, any comments that this is going to make things faster are
51 ignoring the existing code. Why is emerge -S so damned slow?
52
53 Better question, why is it that a mysql cache backend _still_ is so
54 damned slow on emerge -S? That should be hella fast compared to
55 opening 19000 files, right?
56
57 Because the current stable cache design allows *only* for individual
58 record lookups. In other words, even with an rdbms implementation, it
59 goes record by record. What is needed is a way to hand off to the
60 cache "hey you, give me all cpv's that have metadata that matches this
61 criteria".
62
63 Move the lookup/searching into the cache backend, which is already
64 built into the cache refactoring I wrote for cvs head.
65
66 If you want to collapse all of the description data into some faster
67 lookup, fine, do so _strictly_ within that cache backend, and modify
68 that class so that it has an appropriate get_matches lookup that's
69 able to do a specific metadata lookup faster.
70
71 People are free to disgaree mind you, but this talk of speed gains
72 frankly seems to be missing the boat on how our cache actually works,
73 let alone the issues with it.
74
75 Collapsing all metadata down into a single file, yeah that would be
76 nifty from the standpoint of less files/wasted space on fs's.
77 Centralized DESCRIPTION cache implemented in xml? Eh...
78 ~brian