Gentoo Archives: gentoo-portage-dev

From: Zac Medico <zmedico@g.o>
To: "Michał Górny" <mgorny@g.o>, gentoo-portage-dev@l.g.o, Zac Medico <zmedico@g.o>, Chun-Yu Shei <cshei@××××××.com>
Subject: Re: [gentoo-portage-dev] Add caching to a few commonly used functions
Date: Sun, 28 Jun 2020 03:42:39
Message-Id: 7d34b4c7-1425-bc12-e6c9-0d8bf4c51e17@gentoo.org
In Reply to: Re: [gentoo-portage-dev] Add caching to a few commonly used functions by "Michał Górny"
1 On 6/27/20 8:12 PM, Michał Górny wrote:
2 > Dnia June 28, 2020 3:00:00 AM UTC, Zac Medico <zmedico@g.o> napisał(a):
3 >> On 6/26/20 11:34 PM, Chun-Yu Shei wrote:
4 >>> Hi,
5 >>>
6 >>> I was recently interested in whether portage could be speed up, since
7 >>> dependency resolution can sometimes take a while on slower machines.
8 >>> After generating some flame graphs with cProfile and vmprof, I found
9 >> 3
10 >>> functions which seem to be called extremely frequently with the same
11 >>> arguments: catpkgsplit, use_reduce, and match_from_list. In the
12 >> first
13 >>> two cases, it was simple to cache the results in dicts, while
14 >>> match_from_list was a bit trickier, since it seems to be a
15 >> requirement
16 >>> that it return actual entries from the input "candidate_list". I
17 >> also
18 >>> ran into some test failures if I did the caching after the
19 >>> mydep.unevaluated_atom.use and mydep.repo checks towards the end of
20 >> the
21 >>> function, so the caching is only done up to just before that point.
22 >>>
23 >>> The catpkgsplit change seems to definitely be safe, and I'm pretty
24 >> sure
25 >>> the use_reduce one is too, since anything that could possibly change
26 >> the
27 >>> result is hashed. I'm a bit less certain about the match_from_list
28 >> one,
29 >>> although all tests are passing.
30 >>>
31 >>> With all 3 patches together, "emerge -uDvpU --with-bdeps=y @world"
32 >>> speeds up from 43.53 seconds to 30.96 sec -- a 40.6% speedup.
33 >> "emerge
34 >>> -ep @world" is just a tiny bit faster, going from 18.69 to 18.22 sec
35 >>> (2.5% improvement). Since the upgrade case is far more common, this
36 >>> would really help in daily use, and it shaves about 30 seconds off
37 >>> the time you have to wait to get to the [Yes/No] prompt (from ~90s to
38 >>> 60s) on my old Sandy Bridge laptop when performing normal upgrades.
39 >>>
40 >>> Hopefully, at least some of these patches can be incorporated, and
41 >> please
42 >>> let me know if any changes are necessary.
43 >>>
44 >>> Thanks,
45 >>> Chun-Yu
46 >>
47 >> Using global variables for caches like these causes a form of memory
48 >> leak for use cases involving long-running processes that need to work
49 >> with many different repositories (and perhaps multiple versions of
50 >> those
51 >> repositories).
52 >>
53 >> There are at least a couple of different strategies that we can use to
54 >> avoid this form of memory leak:
55 >>
56 >> 1) Limit the scope of the caches so that they have some sort of garbage
57 >> collection life cycle. For example, it would be natural for the
58 >> depgraph
59 >> class to have a local cache of use_reduce results, so that the cache
60 >> can
61 >> be garbage collected along with the depgraph.
62 >>
63 >> 2) Eliminate redundant calls. For example, redundant calls to
64 >> catpkgslit
65 >> can be avoided by constructing more _pkg_str instances, since
66 >> catpkgsplit is able to return early when its argument happens to be a
67 >> _pkg_str instance.
68 >
69 > I think the weak stuff from the standard library might also be helpful.
70 >
71 > --
72 > Best regards,
73 > Michał Górny
74 >
75
76 Hmm, maybe weak global caches are an option?
77 --
78 Thanks,
79 Zac

Attachments

File name MIME type
signature.asc application/pgp-signature

Replies

Subject Author
Re: [gentoo-portage-dev] Add caching to a few commonly used functions "Michał Górny" <mgorny@g.o>