Gentoo Archives: gentoo-user

From: Andreas Fink <finkandreas@×××.de>
To: gentoo-user@l.g.o
Subject: Re: [gentoo-user] Jobs and load-average
Date: Thu, 16 Feb 2023 15:17:37
Message-Id: 20230216161715.55bea246@anfink-laptop
In Reply to: Re: [gentoo-user] Jobs and load-average by Rich Freeman
1 On Thu, 16 Feb 2023 09:24:08 -0500
2 Rich Freeman <rich0@g.o> wrote:
3
4 > On Thu, Feb 16, 2023 at 8:39 AM Peter Humphrey <peter@××××××××××××.uk> wrote:
5 > >
6 > > I've just looked at 'man make', from which it's clear that -j = --jobs, and
7 > > that both those and --load-average are passed to /usr/bin/make, presumably
8 > > untouched unless portage itself has identically named variables. So I wonder
9 > > how feasible it might be for make to incorporate its own checks to ensure that
10 > > the load average is not exceeded. I am not a programmer (not for at least 35
11 > > years, anyway), so I have to leave any such suggestion to the experts.
12 > >
13 >
14 > Well, if we just want to have a fun discussion here are my thoughts.
15 > However, the complexity vs usefulness outside of Gentoo is such that I
16 > don't see it happening.
17 >
18 > For the most typical use case - a developer building the same thing
19 > over and over (which isn't Gentoo), then make could cache info on
20 > resources consumed, and use that to make more educated decisions about
21 > how many tasks to launch. That wouldn't help us at all, but it would
22 > help the typical make user. However, the typical make user can just
23 > tune things in other ways.
24 >
25 > It isn't going to be possible for make to estimate build complexity in
26 > any practical way. Halting problem aside maybe you could build in
27 > some smarts looking at the program being executed and its arguments,
28 > but it would be a big mess.
29 >
30 > Something make could do is tune the damping a bit. It could gradually
31 > increase the number of jobs it runs and watch the load average, and
32 > gradually scale it up appropriately, and gradually scale down if CPU
33 > is the issue, or rapidly scale down if swap is the issue. If swapping
34 > is detected it could even suspend most of the tasks it has spawned and
35 > then gradually continue them as other tasks finish to recover from
36 > this condition. However, this isn't going to work as well if portage
37 > is itself spawning parallel instances of make - they'd have to talk to
38 > each other or portage would somehow need to supervise things.
39 >
40 > A way of thinking about it is that when you have portage spawning
41 > multiple instances of make, that is a bit like adding gain to the
42 > --load-average MAKEOPTS. So each instance of make independently looks
43 > at load average and takes action. So you have an output (compilers
44 > that create load), then you sample that load with a time-weighted
45 > average, and then you apply gain to this average, and then use that as
46 > feedback. That's basically a recipe for out of control oscillation.
47 > You need to add damping and get rid of the gain.
48 >
49 > Disclaimer: I'm not an engineer and I suspect a real engineer would be
50 > able to add a bit more insight.
51 >
52 > Really though the issue is that this is the sort of thing that only
53 > impacts Gentoo and so nobody else is likely to solve this problem for
54 > us.
55 >
56
57 Given all your explenation and my annoyance a couple of years ago, I
58 hacked a little helper that sits between make and spawned build jobs.
59 Basically what annoyed me is the fact that chromium would compile for
60 hours and then fail, because it would need more memory than memory
61 available, and this would fail the whole build.
62 One possible solution is to reduce the number of build jobs to e.g. -j1
63 for chromium, but this is stupid because 99% of the time -j16 would
64 work just fine.
65
66 So I hacked a bit around, and came up with little helper&watcher. The
67 helper would limit spawning new jobs to SOME_LIMIT, and when load
68 is too high (e.g.g I am doing other work on the PC, that's not
69 under emerge's control). The watcher kills memory hungry build jobs,
70 once memory usage higher than 90%, tells the helper to stop spawning new
71 jobs, waits until the helper reports that no more build jobs are
72 running and then respawns the memory hungry build job (i.e. the memory
73 hungry build job will run essentially as if -j1 was specified)
74
75 This way I can mix emerge --jobs=HIGH_NUMBER and make
76 -jOTHER_HIGH_NUMBER, and it wouldn't affect the system, because the
77 total number of actual build jobs is controlled by the helper, and would
78 never go beyond SOME_LIMIT, even if HIGH_NUMBER*OTHER_HIGH_NUMBER > SOME_LIMIT.
79
80 I never published this anywhere, but if there's interest in it, I can
81 probably upload it somewhere, but I had the feeling that it's quite
82 hacky and not worth publishing. Also I was never sure if I break emerge
83 in some way, because it's very low-level, but now it's running since
84 more than a year without any emerge failure due to this hijacking.