Gentoo Archives: gentoo-dev

From: "Tiziano Müller" <dev-zero@g.o>
To: gentoo-dev@l.g.o
Subject: Re: [gentoo-dev] Gentoo Council Reminder for May 28
Date: Thu, 28 May 2009 09:35:06
Message-Id: 1243503301.10450.83.camel@localhost
In Reply to: Re: [gentoo-dev] Gentoo Council Reminder for May 28 by Patrick Lauer
1 Am Donnerstag, den 28.05.2009, 09:23 +0200 schrieb Patrick Lauer:
2 > On Thursday 28 May 2009 07:46:36 Tiziano Müller wrote:
3 >
4 > > And here is why (I'm only looking at the non-degenerated case with valid
5 > > metadata, ignoring overlays which some consider a corner case (I don't
6 > > understand that argument, but that's another thing)):
7 >
8 > overlays tend to come without metadata. Just enabling the KDE overlay changed
9 > the time for "emerge -upNDv world" from ~30 seconds cold cache to ~120
10 > seconds. Running emerge --metadata gets the performance back to pretty much
11 > the old levels.
12 >
13 > > When the package manager looks at a package, it first reads the
14 > > package's ebuild directory and gets the mtimes. It does the same for the
15 > > cache entries and validates the caches (there is more stuff in here,
16 > > like checking eclasses and so on).
17 > Eclasses are negligible because you only have to look at them once for the
18 > whole caclulation. You can cache the mtime for the duration of your operation.
19 >
20 > > Then the following happens based on the "solution" we choose:
21 > > eapi-in-filename: the package manager starts from the highest version
22 > > with a supported eapi (the others are inexistant with the used glob).
23 > > For that ebuild it reads the cache entry and decides whether or not it
24 > > can be used.
25 > In this case you amusingly do NOT want to cache the eapi in the cache, so you
26 > can even defer sourcing the ebuild until you actually need the metadata.
27 by "whether or not it can be used" I meant "keyword-like", surely not
28 eapi-like since you already know it at that point.
29
30 > (You don't want to cache it because you need to check the file mtime anyway,
31 > and then you read the filename anyway. No need to look for it in another place
32 > then :) )
33 > > If not, it proceeds to the next version, if yes, it's done.
34 > > eapi-in-ebuild: the package manager reads all cache entries and sorts
35 > > out those with an EAPI it doesn't support. The rest gets ordered and the
36 > > same procedure as above applies.
37 > >
38 > > So, one of the main differences is: "reading one cache file" (if running
39 > > unstable you can asssume you support the highest version, thus reading
40 > > only one cache file) vs. "reading all cache files".
41 > That assumes a dumb cache format.
42 > Why don't we make the cache more efficient so you read one file per package /
43 > category / ... ?
44 >
45 > >
46 > > I did some performance measurements based on that. I have 1507 installed
47 > > packages with 5541 different versions/revisions.
48 > >
49 > > Reading from hot cache:
50 > > 1507 files: ~50ms
51 > > 5541 files: ~170ms
52 > >
53 > > Reading from cold cache:
54 > > 1507 files: ~2.8s
55 > > 5541 files: ~6s
56 > And now you need to pull metadata for dependency calculation. How big is the
57 > impact of that?
58 The 1507 files are the complete dep-tree cache entries for the highest
59 version, where the 5541 files are all the cache entries for all packages
60 in dep-tree.
61 I did say that I simplified the case a lot, didn't I? :)
62
63 >
64 > >
65 > > I made a lot of assumptions here (neglecting seek between ebuild-dir and
66 > > metadata-dir, other processes using the drive, 80 ebuilds from overlays
67 > > where the ebuild would have to be read, etc.). But estimating from the
68 > > numbers above I'd say that a "emerge -uD world"/"paludis -i world" will
69 > > be at least twice as slow, which I think is not acceptable.
70 > I find that quite acceptable. As long as we're using such a bad layout the
71 > performance is secondary.
72 ... and I don't :)
73
74 >
75 > To fix the performance you'd "only" have to guarantee that the repo is
76 > unchanged (readonly), so you can add lots of simple caches/indexes - no need
77 > to source any ebuild for metadata again, one cachefile for eapi if you want
78 > ... I bet you find lots of small improvements that that would yield. Much more
79 > impressive than managing to avoid a few open() here and there ...
80 >
81 >
82 > > And I also don't understand your point of stating it's "bad design".
83 > Bad design is like smelly feet. It's hard not to notice ...
84 >
85 > > I mean: when coding you should "not optimize prematurely", but with
86 > > eapi-in-ebuild it is against the other principle of "not pessimize
87 > > prematurely" (Sutter/Alexandrescu: C++ Coding Standards).
88 > If you quote that try the full quote:
89 >
90 > "We should forget about small efficiencies, say about 97% of the time:
91 > premature optimization is the root of all evil."
92 >
93 > In other words, we should not try to make that path faster when we can avoid
94 > hitting it at all with a small design revision.
95 >
96 Which you still failed (after one year or so) to provide a nice cleanly
97 written document for.
98
99 --
100 Tiziano Müller
101 Gentoo Linux Developer, Council Member
102 Areas of responsibility:
103 Samba, PostgreSQL, CPP, Python, sysadmin, GLEP Editor
104 E-Mail : dev-zero@g.o
105 GnuPG FP : F327 283A E769 2E36 18D5 4DE2 1B05 6A63 AE9C 1E30

Attachments

File name MIME type
signature.asc application/pgp-signature