1 |
Am Donnerstag, den 28.05.2009, 09:23 +0200 schrieb Patrick Lauer: |
2 |
> On Thursday 28 May 2009 07:46:36 Tiziano Müller wrote: |
3 |
> |
4 |
> > And here is why (I'm only looking at the non-degenerated case with valid |
5 |
> > metadata, ignoring overlays which some consider a corner case (I don't |
6 |
> > understand that argument, but that's another thing)): |
7 |
> |
8 |
> overlays tend to come without metadata. Just enabling the KDE overlay changed |
9 |
> the time for "emerge -upNDv world" from ~30 seconds cold cache to ~120 |
10 |
> seconds. Running emerge --metadata gets the performance back to pretty much |
11 |
> the old levels. |
12 |
> |
13 |
> > When the package manager looks at a package, it first reads the |
14 |
> > package's ebuild directory and gets the mtimes. It does the same for the |
15 |
> > cache entries and validates the caches (there is more stuff in here, |
16 |
> > like checking eclasses and so on). |
17 |
> Eclasses are negligible because you only have to look at them once for the |
18 |
> whole caclulation. You can cache the mtime for the duration of your operation. |
19 |
> |
20 |
> > Then the following happens based on the "solution" we choose: |
21 |
> > eapi-in-filename: the package manager starts from the highest version |
22 |
> > with a supported eapi (the others are inexistant with the used glob). |
23 |
> > For that ebuild it reads the cache entry and decides whether or not it |
24 |
> > can be used. |
25 |
> In this case you amusingly do NOT want to cache the eapi in the cache, so you |
26 |
> can even defer sourcing the ebuild until you actually need the metadata. |
27 |
by "whether or not it can be used" I meant "keyword-like", surely not |
28 |
eapi-like since you already know it at that point. |
29 |
|
30 |
> (You don't want to cache it because you need to check the file mtime anyway, |
31 |
> and then you read the filename anyway. No need to look for it in another place |
32 |
> then :) ) |
33 |
> > If not, it proceeds to the next version, if yes, it's done. |
34 |
> > eapi-in-ebuild: the package manager reads all cache entries and sorts |
35 |
> > out those with an EAPI it doesn't support. The rest gets ordered and the |
36 |
> > same procedure as above applies. |
37 |
> > |
38 |
> > So, one of the main differences is: "reading one cache file" (if running |
39 |
> > unstable you can asssume you support the highest version, thus reading |
40 |
> > only one cache file) vs. "reading all cache files". |
41 |
> That assumes a dumb cache format. |
42 |
> Why don't we make the cache more efficient so you read one file per package / |
43 |
> category / ... ? |
44 |
> |
45 |
> > |
46 |
> > I did some performance measurements based on that. I have 1507 installed |
47 |
> > packages with 5541 different versions/revisions. |
48 |
> > |
49 |
> > Reading from hot cache: |
50 |
> > 1507 files: ~50ms |
51 |
> > 5541 files: ~170ms |
52 |
> > |
53 |
> > Reading from cold cache: |
54 |
> > 1507 files: ~2.8s |
55 |
> > 5541 files: ~6s |
56 |
> And now you need to pull metadata for dependency calculation. How big is the |
57 |
> impact of that? |
58 |
The 1507 files are the complete dep-tree cache entries for the highest |
59 |
version, where the 5541 files are all the cache entries for all packages |
60 |
in dep-tree. |
61 |
I did say that I simplified the case a lot, didn't I? :) |
62 |
|
63 |
> |
64 |
> > |
65 |
> > I made a lot of assumptions here (neglecting seek between ebuild-dir and |
66 |
> > metadata-dir, other processes using the drive, 80 ebuilds from overlays |
67 |
> > where the ebuild would have to be read, etc.). But estimating from the |
68 |
> > numbers above I'd say that a "emerge -uD world"/"paludis -i world" will |
69 |
> > be at least twice as slow, which I think is not acceptable. |
70 |
> I find that quite acceptable. As long as we're using such a bad layout the |
71 |
> performance is secondary. |
72 |
... and I don't :) |
73 |
|
74 |
> |
75 |
> To fix the performance you'd "only" have to guarantee that the repo is |
76 |
> unchanged (readonly), so you can add lots of simple caches/indexes - no need |
77 |
> to source any ebuild for metadata again, one cachefile for eapi if you want |
78 |
> ... I bet you find lots of small improvements that that would yield. Much more |
79 |
> impressive than managing to avoid a few open() here and there ... |
80 |
> |
81 |
> |
82 |
> > And I also don't understand your point of stating it's "bad design". |
83 |
> Bad design is like smelly feet. It's hard not to notice ... |
84 |
> |
85 |
> > I mean: when coding you should "not optimize prematurely", but with |
86 |
> > eapi-in-ebuild it is against the other principle of "not pessimize |
87 |
> > prematurely" (Sutter/Alexandrescu: C++ Coding Standards). |
88 |
> If you quote that try the full quote: |
89 |
> |
90 |
> "We should forget about small efficiencies, say about 97% of the time: |
91 |
> premature optimization is the root of all evil." |
92 |
> |
93 |
> In other words, we should not try to make that path faster when we can avoid |
94 |
> hitting it at all with a small design revision. |
95 |
> |
96 |
Which you still failed (after one year or so) to provide a nice cleanly |
97 |
written document for. |
98 |
|
99 |
-- |
100 |
Tiziano Müller |
101 |
Gentoo Linux Developer, Council Member |
102 |
Areas of responsibility: |
103 |
Samba, PostgreSQL, CPP, Python, sysadmin, GLEP Editor |
104 |
E-Mail : dev-zero@g.o |
105 |
GnuPG FP : F327 283A E769 2E36 18D5 4DE2 1B05 6A63 AE9C 1E30 |