Gentoo Archives: gentoo-portage-dev

From:	Zac Medico <zmedico@g.o>
To:	"Michał Górny" <mgorny@g.o>, gentoo-portage-dev@l.g.o, Zac Medico <zmedico@g.o>, Chun-Yu Shei <cshei@××××××.com>
Subject:	Re: [gentoo-portage-dev] Add caching to a few commonly used functions
Date:	Sun, 28 Jun 2020 03:42:39
Message-Id:	`7d34b4c7-1425-bc12-e6c9-0d8bf4c51e17@gentoo.org`
In Reply to:	Re: [gentoo-portage-dev] Add caching to a few commonly used functions by "Michał Górny"

1	On 6/27/20 8:12 PM, Michał Górny wrote:
2	> Dnia June 28, 2020 3:00:00 AM UTC, Zac Medico <zmedico@g.o> napisał(a):
3	>> On 6/26/20 11:34 PM, Chun-Yu Shei wrote:
4	>>> Hi,
5	>>>
6	>>> I was recently interested in whether portage could be speed up, since
7	>>> dependency resolution can sometimes take a while on slower machines.
8	>>> After generating some flame graphs with cProfile and vmprof, I found
9	>> 3
10	>>> functions which seem to be called extremely frequently with the same
11	>>> arguments: catpkgsplit, use_reduce, and match_from_list. In the
12	>> first
13	>>> two cases, it was simple to cache the results in dicts, while
14	>>> match_from_list was a bit trickier, since it seems to be a
15	>> requirement
16	>>> that it return actual entries from the input "candidate_list". I
17	>> also
18	>>> ran into some test failures if I did the caching after the
19	>>> mydep.unevaluated_atom.use and mydep.repo checks towards the end of
20	>> the
21	>>> function, so the caching is only done up to just before that point.
22	>>>
23	>>> The catpkgsplit change seems to definitely be safe, and I'm pretty
24	>> sure
25	>>> the use_reduce one is too, since anything that could possibly change
26	>> the
27	>>> result is hashed. I'm a bit less certain about the match_from_list
28	>> one,
29	>>> although all tests are passing.
30	>>>
31	>>> With all 3 patches together, "emerge -uDvpU --with-bdeps=y @world"
32	>>> speeds up from 43.53 seconds to 30.96 sec -- a 40.6% speedup.
33	>> "emerge
34	>>> -ep @world" is just a tiny bit faster, going from 18.69 to 18.22 sec
35	>>> (2.5% improvement). Since the upgrade case is far more common, this
36	>>> would really help in daily use, and it shaves about 30 seconds off
37	>>> the time you have to wait to get to the [Yes/No] prompt (from ~90s to
38	>>> 60s) on my old Sandy Bridge laptop when performing normal upgrades.
39	>>>
40	>>> Hopefully, at least some of these patches can be incorporated, and
41	>> please
42	>>> let me know if any changes are necessary.
43	>>>
44	>>> Thanks,
45	>>> Chun-Yu
46	>>
47	>> Using global variables for caches like these causes a form of memory
48	>> leak for use cases involving long-running processes that need to work
49	>> with many different repositories (and perhaps multiple versions of
50	>> those
51	>> repositories).
52	>>
53	>> There are at least a couple of different strategies that we can use to
54	>> avoid this form of memory leak:
55	>>
56	>> 1) Limit the scope of the caches so that they have some sort of garbage
57	>> collection life cycle. For example, it would be natural for the
58	>> depgraph
59	>> class to have a local cache of use_reduce results, so that the cache
60	>> can
61	>> be garbage collected along with the depgraph.
62	>>
63	>> 2) Eliminate redundant calls. For example, redundant calls to
64	>> catpkgslit
65	>> can be avoided by constructing more _pkg_str instances, since
66	>> catpkgsplit is able to return early when its argument happens to be a
67	>> _pkg_str instance.
68	>
69	> I think the weak stuff from the standard library might also be helpful.
70	>
71	> --
72	> Best regards,
73	> Michał Górny
74	>
75
76	Hmm, maybe weak global caches are an option?
77	--
78	Thanks,
79	Zac

Attachments

File name	MIME type
signature.asc	application/pgp-signature

Replies

Subject	Author
Re: [gentoo-portage-dev] Add caching to a few commonly used functions	"Michał Górny" <mgorny@g.o>

Report Message

Find on MARC Find on Google Groups