Gentoo Archives: gentoo-portage-dev

From: Alec Warner <antarus@g.o>
To: gentoo-portage-dev@l.g.o
Subject: Re: [gentoo-portage-dev] [PATCH 1/3] Add caching to catpkgsplit function
Date: Thu, 09 Jul 2020 21:13:54
Message-Id: CAAr7Pr-=x1Z7Xt8K6RqN+UsXSuGi0tE5T+J+VcuTTHpw1uTpHg@mail.gmail.com
In Reply to: Re: [gentoo-portage-dev] [PATCH 1/3] Add caching to catpkgsplit function by Chun-Yu Shei
1 On Thu, Jul 9, 2020 at 2:06 PM Chun-Yu Shei <cshei@××××××.com> wrote:
2
3 > Hmm, that's strange... it seems to have made it to the list archives:
4 > https://archives.gentoo.org/gentoo-portage-dev/message/a4db905a64e3c1f6d88c4876e8291a65
5 >
6 > (but it is entirely possible that I used "git send-email" incorrectly)
7 >
8
9 Ahhh it's visible there; I'll blame gMail ;)
10
11 -A
12
13
14 >
15 > On Thu, Jul 9, 2020 at 2:04 PM Alec Warner <antarus@g.o> wrote:
16 >
17 >>
18 >>
19 >> On Thu, Jul 9, 2020 at 12:03 AM Chun-Yu Shei <cshei@××××××.com> wrote:
20 >>
21 >>> Awesome! Here's a patch that adds @lru_cache to use_reduce, vercmp, and
22 >>> catpkgsplit. use_reduce was split into 2 functions, with the outer one
23 >>> converting lists/sets to tuples so they can be hashed and creating a
24 >>> copy of the returned list (since the caller seems to modify it
25 >>> sometimes). I tried to select cache sizes that minimized memory use
26 >>> increase,
27 >>> while still providing about the same speedup compared to a cache with
28 >>> unbounded size. "emerge -uDvpU --with-bdeps=y @world" runtime decreases
29 >>> from 44.32s -> 29.94s -- a 48% speedup, while the maximum value of the
30 >>> RES column in htop increases from 280 MB -> 290 MB.
31 >>>
32 >>> "emerge -ep @world" time slightly decreases from 18.77s -> 17.93, while
33 >>> max observed RES value actually decreases from 228 MB -> 214 MB (similar
34 >>> values observed across a few before/after runs).
35 >>>
36 >>> Here are the cache hit stats, max observed RES memory, and runtime in
37 >>> seconds for various sizes in the update case. Caching for each
38 >>> function was tested independently (only 1 function with caching enabled
39 >>> at a time):
40 >>>
41 >>> catpkgsplit:
42 >>> CacheInfo(hits=1222233, misses=21419, maxsize=None, currsize=21419)
43 >>> 270 MB
44 >>> 39.217
45 >>>
46 >>> CacheInfo(hits=1218900, misses=24905, maxsize=10000, currsize=10000)
47 >>> 271 MB
48 >>> 39.112
49 >>>
50 >>> CacheInfo(hits=1212675, misses=31022, maxsize=5000, currsize=5000)
51 >>> 271 MB
52 >>> 39.217
53 >>>
54 >>> CacheInfo(hits=1207879, misses=35878, maxsize=2500, currsize=2500)
55 >>> 269 MB
56 >>> 39.438
57 >>>
58 >>> CacheInfo(hits=1199402, misses=44250, maxsize=1000, currsize=1000)
59 >>> 271 MB
60 >>> 39.348
61 >>>
62 >>> CacheInfo(hits=1149150, misses=94610, maxsize=100, currsize=100)
63 >>> 271 MB
64 >>> 39.487
65 >>>
66 >>>
67 >>> use_reduce:
68 >>> CacheInfo(hits=45326, misses=18660, maxsize=None, currsize=18561)
69 >>> 407 MB
70 >>> 35.77
71 >>>
72 >>> CacheInfo(hits=45186, misses=18800, maxsize=10000, currsize=10000)
73 >>> 353 MB
74 >>> 35.52
75 >>>
76 >>> CacheInfo(hits=44977, misses=19009, maxsize=5000, currsize=5000)
77 >>> 335 MB
78 >>> 35.31
79 >>>
80 >>> CacheInfo(hits=44691, misses=19295, maxsize=2500, currsize=2500)
81 >>> 318 MB
82 >>> 35.85
83 >>>
84 >>> CacheInfo(hits=44178, misses=19808, maxsize=1000, currsize=1000)
85 >>> 301 MB
86 >>> 36.39
87 >>>
88 >>> CacheInfo(hits=41211, misses=22775, maxsize=100, currsize=100)
89 >>> 299 MB
90 >>> 37.175
91 >>>
92 >>>
93 >>> I didn't bother collecting detailed stats for vercmp, since the
94 >>> inputs/outputs are quite small and don't cause much memory increase.
95 >>> Please let me know if there are any other suggestions/improvements (and
96 >>> thanks Sid for the lru_cache suggestion!).
97 >>>
98 >>
99 >> I don't see a patch attached; can you link to it?
100 >>
101 >> -A
102 >>
103 >>
104 >>>
105 >>> Thanks,
106 >>> Chun-Yu
107 >>>
108 >>>
109 >>>
110 >>>