Gentoo Archives: gentoo-dev

From: Patrick Lauer <patrick@g.o>
To: gentoo-dev@l.g.o
Subject: Re: [gentoo-dev] Gentoo Council Reminder for May 28
Date: Thu, 28 May 2009 07:23:50
Message-Id: 200905280923.46297.patrick@gentoo.org
In Reply to: Re: [gentoo-dev] Gentoo Council Reminder for May 28 by "Tiziano Müller"
1 On Thursday 28 May 2009 07:46:36 Tiziano Müller wrote:
2
3 > And here is why (I'm only looking at the non-degenerated case with valid
4 > metadata, ignoring overlays which some consider a corner case (I don't
5 > understand that argument, but that's another thing)):
6
7 overlays tend to come without metadata. Just enabling the KDE overlay changed
8 the time for "emerge -upNDv world" from ~30 seconds cold cache to ~120
9 seconds. Running emerge --metadata gets the performance back to pretty much
10 the old levels.
11
12 > When the package manager looks at a package, it first reads the
13 > package's ebuild directory and gets the mtimes. It does the same for the
14 > cache entries and validates the caches (there is more stuff in here,
15 > like checking eclasses and so on).
16 Eclasses are negligible because you only have to look at them once for the
17 whole caclulation. You can cache the mtime for the duration of your operation.
18
19 > Then the following happens based on the "solution" we choose:
20 > eapi-in-filename: the package manager starts from the highest version
21 > with a supported eapi (the others are inexistant with the used glob).
22 > For that ebuild it reads the cache entry and decides whether or not it
23 > can be used.
24 In this case you amusingly do NOT want to cache the eapi in the cache, so you
25 can even defer sourcing the ebuild until you actually need the metadata.
26 (You don't want to cache it because you need to check the file mtime anyway,
27 and then you read the filename anyway. No need to look for it in another place
28 then :) )
29
30 > If not, it proceeds to the next version, if yes, it's done.
31 > eapi-in-ebuild: the package manager reads all cache entries and sorts
32 > out those with an EAPI it doesn't support. The rest gets ordered and the
33 > same procedure as above applies.
34 >
35 > So, one of the main differences is: "reading one cache file" (if running
36 > unstable you can asssume you support the highest version, thus reading
37 > only one cache file) vs. "reading all cache files".
38 That assumes a dumb cache format.
39 Why don't we make the cache more efficient so you read one file per package /
40 category / ... ?
41
42 >
43 > I did some performance measurements based on that. I have 1507 installed
44 > packages with 5541 different versions/revisions.
45 >
46 > Reading from hot cache:
47 > 1507 files: ~50ms
48 > 5541 files: ~170ms
49 >
50 > Reading from cold cache:
51 > 1507 files: ~2.8s
52 > 5541 files: ~6s
53 And now you need to pull metadata for dependency calculation. How big is the
54 impact of that?
55
56 >
57 > I made a lot of assumptions here (neglecting seek between ebuild-dir and
58 > metadata-dir, other processes using the drive, 80 ebuilds from overlays
59 > where the ebuild would have to be read, etc.). But estimating from the
60 > numbers above I'd say that a "emerge -uD world"/"paludis -i world" will
61 > be at least twice as slow, which I think is not acceptable.
62 I find that quite acceptable. As long as we're using such a bad layout the
63 performance is secondary.
64
65 To fix the performance you'd "only" have to guarantee that the repo is
66 unchanged (readonly), so you can add lots of simple caches/indexes - no need
67 to source any ebuild for metadata again, one cachefile for eapi if you want
68 ... I bet you find lots of small improvements that that would yield. Much more
69 impressive than managing to avoid a few open() here and there ...
70
71
72 > And I also don't understand your point of stating it's "bad design".
73 Bad design is like smelly feet. It's hard not to notice ...
74
75 > I mean: when coding you should "not optimize prematurely", but with
76 > eapi-in-ebuild it is against the other principle of "not pessimize
77 > prematurely" (Sutter/Alexandrescu: C++ Coding Standards).
78 If you quote that try the full quote:
79
80 "We should forget about small efficiencies, say about 97% of the time:
81 premature optimization is the root of all evil."
82
83 In other words, we should not try to make that path faster when we can avoid
84 hitting it at all with a small design revision.

Replies

Subject Author
Re: [gentoo-dev] Gentoo Council Reminder for May 28 "Tiziano Müller" <dev-zero@g.o>