1 |
On Thursday 28 May 2009 07:46:36 Tiziano Müller wrote: |
2 |
|
3 |
> And here is why (I'm only looking at the non-degenerated case with valid |
4 |
> metadata, ignoring overlays which some consider a corner case (I don't |
5 |
> understand that argument, but that's another thing)): |
6 |
|
7 |
overlays tend to come without metadata. Just enabling the KDE overlay changed |
8 |
the time for "emerge -upNDv world" from ~30 seconds cold cache to ~120 |
9 |
seconds. Running emerge --metadata gets the performance back to pretty much |
10 |
the old levels. |
11 |
|
12 |
> When the package manager looks at a package, it first reads the |
13 |
> package's ebuild directory and gets the mtimes. It does the same for the |
14 |
> cache entries and validates the caches (there is more stuff in here, |
15 |
> like checking eclasses and so on). |
16 |
Eclasses are negligible because you only have to look at them once for the |
17 |
whole caclulation. You can cache the mtime for the duration of your operation. |
18 |
|
19 |
> Then the following happens based on the "solution" we choose: |
20 |
> eapi-in-filename: the package manager starts from the highest version |
21 |
> with a supported eapi (the others are inexistant with the used glob). |
22 |
> For that ebuild it reads the cache entry and decides whether or not it |
23 |
> can be used. |
24 |
In this case you amusingly do NOT want to cache the eapi in the cache, so you |
25 |
can even defer sourcing the ebuild until you actually need the metadata. |
26 |
(You don't want to cache it because you need to check the file mtime anyway, |
27 |
and then you read the filename anyway. No need to look for it in another place |
28 |
then :) ) |
29 |
|
30 |
> If not, it proceeds to the next version, if yes, it's done. |
31 |
> eapi-in-ebuild: the package manager reads all cache entries and sorts |
32 |
> out those with an EAPI it doesn't support. The rest gets ordered and the |
33 |
> same procedure as above applies. |
34 |
> |
35 |
> So, one of the main differences is: "reading one cache file" (if running |
36 |
> unstable you can asssume you support the highest version, thus reading |
37 |
> only one cache file) vs. "reading all cache files". |
38 |
That assumes a dumb cache format. |
39 |
Why don't we make the cache more efficient so you read one file per package / |
40 |
category / ... ? |
41 |
|
42 |
> |
43 |
> I did some performance measurements based on that. I have 1507 installed |
44 |
> packages with 5541 different versions/revisions. |
45 |
> |
46 |
> Reading from hot cache: |
47 |
> 1507 files: ~50ms |
48 |
> 5541 files: ~170ms |
49 |
> |
50 |
> Reading from cold cache: |
51 |
> 1507 files: ~2.8s |
52 |
> 5541 files: ~6s |
53 |
And now you need to pull metadata for dependency calculation. How big is the |
54 |
impact of that? |
55 |
|
56 |
> |
57 |
> I made a lot of assumptions here (neglecting seek between ebuild-dir and |
58 |
> metadata-dir, other processes using the drive, 80 ebuilds from overlays |
59 |
> where the ebuild would have to be read, etc.). But estimating from the |
60 |
> numbers above I'd say that a "emerge -uD world"/"paludis -i world" will |
61 |
> be at least twice as slow, which I think is not acceptable. |
62 |
I find that quite acceptable. As long as we're using such a bad layout the |
63 |
performance is secondary. |
64 |
|
65 |
To fix the performance you'd "only" have to guarantee that the repo is |
66 |
unchanged (readonly), so you can add lots of simple caches/indexes - no need |
67 |
to source any ebuild for metadata again, one cachefile for eapi if you want |
68 |
... I bet you find lots of small improvements that that would yield. Much more |
69 |
impressive than managing to avoid a few open() here and there ... |
70 |
|
71 |
|
72 |
> And I also don't understand your point of stating it's "bad design". |
73 |
Bad design is like smelly feet. It's hard not to notice ... |
74 |
|
75 |
> I mean: when coding you should "not optimize prematurely", but with |
76 |
> eapi-in-ebuild it is against the other principle of "not pessimize |
77 |
> prematurely" (Sutter/Alexandrescu: C++ Coding Standards). |
78 |
If you quote that try the full quote: |
79 |
|
80 |
"We should forget about small efficiencies, say about 97% of the time: |
81 |
premature optimization is the root of all evil." |
82 |
|
83 |
In other words, we should not try to make that path faster when we can avoid |
84 |
hitting it at all with a small design revision. |