Gentoo Archives: gentoo-dev

From:	Brian Harring <ferringb@g.o>
To:	gentoo-dev@l.g.o
Subject:	Re: [gentoo-dev] [RFC] Treewide metadata.xml
Date:	Fri, 27 May 2005 12:20:48
Message-Id:	`20050527122138.GA1837@exodus.wit.org`
In Reply to:	Re: [gentoo-dev] [RFC] Treewide metadata.xml by Danny van Dyk

1	On Fri, May 27, 2005 at 01:47:37PM +0200, Danny van Dyk wrote:
2	> Hi Brian
3	> > What's the gain, aside from implication of collapsing it into a
4	> > single file? Honestly my only use for metadata.xml is looking up who
5	> > I get to poke about fixing broken ebuilds...
6	> The gain is:
7	> ... that you portage people could use it for emerge -s instead of using
8	> a DESCRIPTION-cache.
9
10	'you portage people' ? :)
11
12	> ... we don't need to find the metadata.xml file before parsing it.
13
14	Portage's emerge -s doesn't use metadata.xml. Guessing you meant
15	emerge -S (--searchDesc), but that too, doesn't use metadata.xml.
16
17	So, a few implications in what you mean/are after then.
18	1) This global description cache would have to be duplicated, and
19	recreated on cvs->rsync runs. Why? Unless you're padding extra bytes
20	in the description cache, updates _will_ kill performance.
21	Personally, I'm not much for it because there is a minimal window for
22	cvs->rsync infra-side to get it's thing done, and this will jack up
23	the runtime.
24
25	2) You're still doing entry by entry. Y'all are assuming having this
26	data shoved into one file is going to make it quicker for reads (in
27	reality, you're still reading 19000+ records, just your solution is
28	out of a single file). This may be quicker due to syscall overhead,
29	but I posit the drawbacks aren't worth it.
30
31	3) This complicates the hell out of cache updates, and still suffers
32	the same issues eix/esearch suffer- namely that it's not sensitive to
33	cache updates. If we make it sensitive to cache updates, you're
34	looking at regen runtimes going through the roof (see #1 comment on
35	updates). This is regardless of if it's a duplication approach or
36	description is stored in it's own db outside of the normal flat_list
37	cache files.
38
39	4) This proposal breaks the cache up into seperate chunks. That's
40	the cache backends decision frankly, and _cannot_ be imposed onto the
41	cache backend implementation from above.
42
43	I moved eclass data into the cache backend in cvs head explicitly
44	for the purpose of allowing the cache to be effectively standalone,
45	and able to be bound to a remote tree. You force this change from
46	above, it breaks the cache design (pure and simple), and ultimately
47	isn't what you're after (see below).
48
49
50	Frankly, any comments that this is going to make things faster are
51	ignoring the existing code. Why is emerge -S so damned slow?
52
53	Better question, why is it that a mysql cache backend _still_ is so
54	damned slow on emerge -S? That should be hella fast compared to
55	opening 19000 files, right?
56
57	Because the current stable cache design allows only for individual
58	record lookups. In other words, even with an rdbms implementation, it
59	goes record by record. What is needed is a way to hand off to the
60	cache "hey you, give me all cpv's that have metadata that matches this
61	criteria".
62
63	Move the lookup/searching into the cache backend, which is already
64	built into the cache refactoring I wrote for cvs head.
65
66	If you want to collapse all of the description data into some faster
67	lookup, fine, do so _strictly_ within that cache backend, and modify
68	that class so that it has an appropriate get_matches lookup that's
69	able to do a specific metadata lookup faster.
70
71	People are free to disgaree mind you, but this talk of speed gains
72	frankly seems to be missing the boat on how our cache actually works,
73	let alone the issues with it.
74
75	Collapsing all metadata down into a single file, yeah that would be
76	nifty from the standpoint of less files/wasted space on fs's.
77	Centralized DESCRIPTION cache implemented in xml? Eh...
78	~brian

Report Message

Find on MARC Find on Google Groups