1 |
On Thu, 16 Feb 2023 09:24:08 -0500 |
2 |
Rich Freeman <rich0@g.o> wrote: |
3 |
|
4 |
> On Thu, Feb 16, 2023 at 8:39 AM Peter Humphrey <peter@××××××××××××.uk> wrote: |
5 |
> > |
6 |
> > I've just looked at 'man make', from which it's clear that -j = --jobs, and |
7 |
> > that both those and --load-average are passed to /usr/bin/make, presumably |
8 |
> > untouched unless portage itself has identically named variables. So I wonder |
9 |
> > how feasible it might be for make to incorporate its own checks to ensure that |
10 |
> > the load average is not exceeded. I am not a programmer (not for at least 35 |
11 |
> > years, anyway), so I have to leave any such suggestion to the experts. |
12 |
> > |
13 |
> |
14 |
> Well, if we just want to have a fun discussion here are my thoughts. |
15 |
> However, the complexity vs usefulness outside of Gentoo is such that I |
16 |
> don't see it happening. |
17 |
> |
18 |
> For the most typical use case - a developer building the same thing |
19 |
> over and over (which isn't Gentoo), then make could cache info on |
20 |
> resources consumed, and use that to make more educated decisions about |
21 |
> how many tasks to launch. That wouldn't help us at all, but it would |
22 |
> help the typical make user. However, the typical make user can just |
23 |
> tune things in other ways. |
24 |
> |
25 |
> It isn't going to be possible for make to estimate build complexity in |
26 |
> any practical way. Halting problem aside maybe you could build in |
27 |
> some smarts looking at the program being executed and its arguments, |
28 |
> but it would be a big mess. |
29 |
> |
30 |
> Something make could do is tune the damping a bit. It could gradually |
31 |
> increase the number of jobs it runs and watch the load average, and |
32 |
> gradually scale it up appropriately, and gradually scale down if CPU |
33 |
> is the issue, or rapidly scale down if swap is the issue. If swapping |
34 |
> is detected it could even suspend most of the tasks it has spawned and |
35 |
> then gradually continue them as other tasks finish to recover from |
36 |
> this condition. However, this isn't going to work as well if portage |
37 |
> is itself spawning parallel instances of make - they'd have to talk to |
38 |
> each other or portage would somehow need to supervise things. |
39 |
> |
40 |
> A way of thinking about it is that when you have portage spawning |
41 |
> multiple instances of make, that is a bit like adding gain to the |
42 |
> --load-average MAKEOPTS. So each instance of make independently looks |
43 |
> at load average and takes action. So you have an output (compilers |
44 |
> that create load), then you sample that load with a time-weighted |
45 |
> average, and then you apply gain to this average, and then use that as |
46 |
> feedback. That's basically a recipe for out of control oscillation. |
47 |
> You need to add damping and get rid of the gain. |
48 |
> |
49 |
> Disclaimer: I'm not an engineer and I suspect a real engineer would be |
50 |
> able to add a bit more insight. |
51 |
> |
52 |
> Really though the issue is that this is the sort of thing that only |
53 |
> impacts Gentoo and so nobody else is likely to solve this problem for |
54 |
> us. |
55 |
> |
56 |
|
57 |
Given all your explenation and my annoyance a couple of years ago, I |
58 |
hacked a little helper that sits between make and spawned build jobs. |
59 |
Basically what annoyed me is the fact that chromium would compile for |
60 |
hours and then fail, because it would need more memory than memory |
61 |
available, and this would fail the whole build. |
62 |
One possible solution is to reduce the number of build jobs to e.g. -j1 |
63 |
for chromium, but this is stupid because 99% of the time -j16 would |
64 |
work just fine. |
65 |
|
66 |
So I hacked a bit around, and came up with little helper&watcher. The |
67 |
helper would limit spawning new jobs to SOME_LIMIT, and when load |
68 |
is too high (e.g.g I am doing other work on the PC, that's not |
69 |
under emerge's control). The watcher kills memory hungry build jobs, |
70 |
once memory usage higher than 90%, tells the helper to stop spawning new |
71 |
jobs, waits until the helper reports that no more build jobs are |
72 |
running and then respawns the memory hungry build job (i.e. the memory |
73 |
hungry build job will run essentially as if -j1 was specified) |
74 |
|
75 |
This way I can mix emerge --jobs=HIGH_NUMBER and make |
76 |
-jOTHER_HIGH_NUMBER, and it wouldn't affect the system, because the |
77 |
total number of actual build jobs is controlled by the helper, and would |
78 |
never go beyond SOME_LIMIT, even if HIGH_NUMBER*OTHER_HIGH_NUMBER > SOME_LIMIT. |
79 |
|
80 |
I never published this anywhere, but if there's interest in it, I can |
81 |
probably upload it somewhere, but I had the feeling that it's quite |
82 |
hacky and not worth publishing. Also I was never sure if I break emerge |
83 |
in some way, because it's very low-level, but now it's running since |
84 |
more than a year without any emerge failure due to this hijacking. |