Gentoo Archives: gentoo-user

From:	Andreas Fink <finkandreas@×××.de>
To:	gentoo-user@l.g.o
Subject:	Re: [gentoo-user] Jobs and load-average
Date:	Thu, 16 Feb 2023 15:17:37
Message-Id:	`20230216161715.55bea246@anfink-laptop`
In Reply to:	Re: [gentoo-user] Jobs and load-average by Rich Freeman

1	On Thu, 16 Feb 2023 09:24:08 -0500
2	Rich Freeman <rich0@g.o> wrote:
3
4	> On Thu, Feb 16, 2023 at 8:39 AM Peter Humphrey <peter@××××××××××××.uk> wrote:
5	> >
6	> > I've just looked at 'man make', from which it's clear that -j = --jobs, and
7	> > that both those and --load-average are passed to /usr/bin/make, presumably
8	> > untouched unless portage itself has identically named variables. So I wonder
9	> > how feasible it might be for make to incorporate its own checks to ensure that
10	> > the load average is not exceeded. I am not a programmer (not for at least 35
11	> > years, anyway), so I have to leave any such suggestion to the experts.
12	> >
13	>
14	> Well, if we just want to have a fun discussion here are my thoughts.
15	> However, the complexity vs usefulness outside of Gentoo is such that I
16	> don't see it happening.
17	>
18	> For the most typical use case - a developer building the same thing
19	> over and over (which isn't Gentoo), then make could cache info on
20	> resources consumed, and use that to make more educated decisions about
21	> how many tasks to launch. That wouldn't help us at all, but it would
22	> help the typical make user. However, the typical make user can just
23	> tune things in other ways.
24	>
25	> It isn't going to be possible for make to estimate build complexity in
26	> any practical way. Halting problem aside maybe you could build in
27	> some smarts looking at the program being executed and its arguments,
28	> but it would be a big mess.
29	>
30	> Something make could do is tune the damping a bit. It could gradually
31	> increase the number of jobs it runs and watch the load average, and
32	> gradually scale it up appropriately, and gradually scale down if CPU
33	> is the issue, or rapidly scale down if swap is the issue. If swapping
34	> is detected it could even suspend most of the tasks it has spawned and
35	> then gradually continue them as other tasks finish to recover from
36	> this condition. However, this isn't going to work as well if portage
37	> is itself spawning parallel instances of make - they'd have to talk to
38	> each other or portage would somehow need to supervise things.
39	>
40	> A way of thinking about it is that when you have portage spawning
41	> multiple instances of make, that is a bit like adding gain to the
42	> --load-average MAKEOPTS. So each instance of make independently looks
43	> at load average and takes action. So you have an output (compilers
44	> that create load), then you sample that load with a time-weighted
45	> average, and then you apply gain to this average, and then use that as
46	> feedback. That's basically a recipe for out of control oscillation.
47	> You need to add damping and get rid of the gain.
48	>
49	> Disclaimer: I'm not an engineer and I suspect a real engineer would be
50	> able to add a bit more insight.
51	>
52	> Really though the issue is that this is the sort of thing that only
53	> impacts Gentoo and so nobody else is likely to solve this problem for
54	> us.
55	>
56
57	Given all your explenation and my annoyance a couple of years ago, I
58	hacked a little helper that sits between make and spawned build jobs.
59	Basically what annoyed me is the fact that chromium would compile for
60	hours and then fail, because it would need more memory than memory
61	available, and this would fail the whole build.
62	One possible solution is to reduce the number of build jobs to e.g. -j1
63	for chromium, but this is stupid because 99% of the time -j16 would
64	work just fine.
65
66	So I hacked a bit around, and came up with little helper&watcher. The
67	helper would limit spawning new jobs to SOME_LIMIT, and when load
68	is too high (e.g.g I am doing other work on the PC, that's not
69	under emerge's control). The watcher kills memory hungry build jobs,
70	once memory usage higher than 90%, tells the helper to stop spawning new
71	jobs, waits until the helper reports that no more build jobs are
72	running and then respawns the memory hungry build job (i.e. the memory
73	hungry build job will run essentially as if -j1 was specified)
74
75	This way I can mix emerge --jobs=HIGH_NUMBER and make
76	-jOTHER_HIGH_NUMBER, and it wouldn't affect the system, because the
77	total number of actual build jobs is controlled by the helper, and would
78	never go beyond SOME_LIMIT, even if HIGH_NUMBER*OTHER_HIGH_NUMBER > SOME_LIMIT.
79
80	I never published this anywhere, but if there's interest in it, I can
81	probably upload it somewhere, but I had the feeling that it's quite
82	hacky and not worth publishing. Also I was never sure if I break emerge
83	in some way, because it's very low-level, but now it's running since
84	more than a year without any emerge failure due to this hijacking.

Report Message

Find on MARC Find on Google Groups