1 |
On Fri, May 27, 2005 at 01:47:37PM +0200, Danny van Dyk wrote: |
2 |
> Hi Brian |
3 |
> > What's the gain, aside from implication of collapsing it into a |
4 |
> > single file? Honestly my only use for metadata.xml is looking up who |
5 |
> > I get to poke about fixing broken ebuilds... |
6 |
> The gain is: |
7 |
> ... that you portage people could use it for emerge -s instead of using |
8 |
> a DESCRIPTION-cache. |
9 |
|
10 |
'you portage people' ? :) |
11 |
|
12 |
> ... we don't need to find the metadata.xml file before parsing it. |
13 |
|
14 |
Portage's emerge -s doesn't use metadata.xml. Guessing you meant |
15 |
emerge -S (--searchDesc), but that too, doesn't use metadata.xml. |
16 |
|
17 |
So, a few implications in what you mean/are after then. |
18 |
1) This global description cache would have to be duplicated, and |
19 |
recreated on cvs->rsync runs. Why? Unless you're padding extra bytes |
20 |
in the description cache, updates _will_ kill performance. |
21 |
Personally, I'm not much for it because there is a minimal window for |
22 |
cvs->rsync infra-side to get it's thing done, and this will jack up |
23 |
the runtime. |
24 |
|
25 |
2) You're still doing entry by entry. Y'all are assuming having this |
26 |
data shoved into one file is going to make it quicker for reads (in |
27 |
reality, you're still reading 19000+ records, just your solution is |
28 |
out of a single file). This may be quicker due to syscall overhead, |
29 |
but I posit the drawbacks aren't worth it. |
30 |
|
31 |
3) This complicates the hell out of cache updates, and still suffers |
32 |
the same issues eix/esearch suffer- namely that it's not sensitive to |
33 |
cache updates. If we make it sensitive to cache updates, you're |
34 |
looking at regen runtimes going through the roof (see #1 comment on |
35 |
updates). This is regardless of if it's a duplication approach or |
36 |
description is stored in it's own db outside of the normal flat_list |
37 |
cache files. |
38 |
|
39 |
4) This proposal breaks the cache up into seperate chunks. That's |
40 |
the cache backends decision frankly, and _cannot_ be imposed onto the |
41 |
cache backend implementation from above. |
42 |
|
43 |
I moved eclass data into the cache backend in cvs head explicitly |
44 |
for the purpose of allowing the cache to be effectively standalone, |
45 |
and able to be bound to a remote tree. You force this change from |
46 |
above, it breaks the cache design (pure and simple), and ultimately |
47 |
isn't what you're after (see below). |
48 |
|
49 |
|
50 |
Frankly, any comments that this is going to make things faster are |
51 |
ignoring the existing code. Why is emerge -S so damned slow? |
52 |
|
53 |
Better question, why is it that a mysql cache backend _still_ is so |
54 |
damned slow on emerge -S? That should be hella fast compared to |
55 |
opening 19000 files, right? |
56 |
|
57 |
Because the current stable cache design allows *only* for individual |
58 |
record lookups. In other words, even with an rdbms implementation, it |
59 |
goes record by record. What is needed is a way to hand off to the |
60 |
cache "hey you, give me all cpv's that have metadata that matches this |
61 |
criteria". |
62 |
|
63 |
Move the lookup/searching into the cache backend, which is already |
64 |
built into the cache refactoring I wrote for cvs head. |
65 |
|
66 |
If you want to collapse all of the description data into some faster |
67 |
lookup, fine, do so _strictly_ within that cache backend, and modify |
68 |
that class so that it has an appropriate get_matches lookup that's |
69 |
able to do a specific metadata lookup faster. |
70 |
|
71 |
People are free to disgaree mind you, but this talk of speed gains |
72 |
frankly seems to be missing the boat on how our cache actually works, |
73 |
let alone the issues with it. |
74 |
|
75 |
Collapsing all metadata down into a single file, yeah that would be |
76 |
nifty from the standpoint of less files/wasted space on fs's. |
77 |
Centralized DESCRIPTION cache implemented in xml? Eh... |
78 |
~brian |