Gentoo Archives: gentoo-portage-dev

From: Zac Medico <zmedico@g.o>
To: gentoo-portage-dev@l.g.o, Chun-Yu Shei <cshei@××××××.com>
Subject: Re: [gentoo-portage-dev] Add caching to a few commonly used functions
Date: Sun, 28 Jun 2020 03:00:38
Message-Id: 47f241aa-977f-e044-6770-f9f314747f85@gentoo.org
In Reply to: [gentoo-portage-dev] Add caching to a few commonly used functions by Chun-Yu Shei
1 On 6/26/20 11:34 PM, Chun-Yu Shei wrote:
2 > Hi,
3 >
4 > I was recently interested in whether portage could be speed up, since
5 > dependency resolution can sometimes take a while on slower machines.
6 > After generating some flame graphs with cProfile and vmprof, I found 3
7 > functions which seem to be called extremely frequently with the same
8 > arguments: catpkgsplit, use_reduce, and match_from_list. In the first
9 > two cases, it was simple to cache the results in dicts, while
10 > match_from_list was a bit trickier, since it seems to be a requirement
11 > that it return actual entries from the input "candidate_list". I also
12 > ran into some test failures if I did the caching after the
13 > mydep.unevaluated_atom.use and mydep.repo checks towards the end of the
14 > function, so the caching is only done up to just before that point.
15 >
16 > The catpkgsplit change seems to definitely be safe, and I'm pretty sure
17 > the use_reduce one is too, since anything that could possibly change the
18 > result is hashed. I'm a bit less certain about the match_from_list one,
19 > although all tests are passing.
20 >
21 > With all 3 patches together, "emerge -uDvpU --with-bdeps=y @world"
22 > speeds up from 43.53 seconds to 30.96 sec -- a 40.6% speedup. "emerge
23 > -ep @world" is just a tiny bit faster, going from 18.69 to 18.22 sec
24 > (2.5% improvement). Since the upgrade case is far more common, this
25 > would really help in daily use, and it shaves about 30 seconds off
26 > the time you have to wait to get to the [Yes/No] prompt (from ~90s to
27 > 60s) on my old Sandy Bridge laptop when performing normal upgrades.
28 >
29 > Hopefully, at least some of these patches can be incorporated, and please
30 > let me know if any changes are necessary.
31 >
32 > Thanks,
33 > Chun-Yu
34
35 Using global variables for caches like these causes a form of memory
36 leak for use cases involving long-running processes that need to work
37 with many different repositories (and perhaps multiple versions of those
38 repositories).
39
40 There are at least a couple of different strategies that we can use to
41 avoid this form of memory leak:
42
43 1) Limit the scope of the caches so that they have some sort of garbage
44 collection life cycle. For example, it would be natural for the depgraph
45 class to have a local cache of use_reduce results, so that the cache can
46 be garbage collected along with the depgraph.
47
48 2) Eliminate redundant calls. For example, redundant calls to catpkgslit
49 can be avoided by constructing more _pkg_str instances, since
50 catpkgsplit is able to return early when its argument happens to be a
51 _pkg_str instance.
52 --
53 Thanks,
54 Zac

Attachments

File name MIME type
signature.asc application/pgp-signature

Replies

Subject Author
Re: [gentoo-portage-dev] Add caching to a few commonly used functions "Michał Górny" <mgorny@g.o>