Gentoo Archives: gentoo-dev

From: Kai Krakow <hurikhan77+bgo@×××××.com>
To: gentoo-dev@l.g.o
Cc: Florian Schmaus <flow@g.o>
Subject: Re: [gentoo-dev] [PATCH] check-reqs.eclass: clamp MAKEOPTS for memory/RAM usage
Date: Thu, 06 Jan 2022 01:41:46
Message-Id: CAMthOuM5ipzAuDzOQnx_8u0mMcyyb0saX06pkzakiZSOD-hnsg@mail.gmail.com
In Reply to: Re: [gentoo-dev] [PATCH] check-reqs.eclass: clamp MAKEOPTS for memory/RAM usage by Sam James
1 Am Mi., 5. Jan. 2022 um 21:21 Uhr schrieb Sam James <sam@g.o>:
2 >
3 >> On 5 Jan 2022, at 19:18, Kai Krakow <kai@××××××××.de> wrote:
4 >
5 >>> Am Mi., 5. Jan. 2022 um 19:22 Uhr schrieb Ulrich Mueller <ulm@g.o>:
6 >
7 > [...]
8 >
9 >>> That applies to all parallel builds though, not only to ebuilds
10 >>> inheriting check-reqs.eclass. By tweaking MAKEOPTS, we're basically
11 >>> telling the user that the --jobs setting in their make.conf is wrong,
12 >>> in the first place.
13 >
14 >
15 >> Well, I'm using a safe combination of jobs and load-average, maybe the
16 >> documentation should be tweaked instead.
17 >
18 >
19 > I think "safe" is doing some heavy lifting here...
20
21 Well, works "safe" for me at least, but you're right.
22
23 >> I'm using
24 >> [...]
25 >
26 >
27 >> The "--jobs" parameter is mostly a safe-guard against "make" or
28 >> "emerge" overshooting the system resources which would happen if
29 >> running unconstrained without "--load-average". The latter parameter
30 >> OTOH tunes the parallel building processes automatically to the
31 >> available resources. If the system starves of memory, thus starts to
32 >> swap, load will increase, and make will reduce the jobs. It works
33 >> pretty well.
34 >
35 >> I've chosen the emerge loadavg limit slightly higher so a heavy ebuild
36 >> won't starve emerge from running configure phases of parallel ebuilds.
37 >
38 >
39 > ... because it's quite hard for this logic to work correctly enough
40 > of the time without jobserver integration (https://bugs.gentoo.org/692576).
41
42 Oh there's a bug report about this... I already wondered: Wouldn't it
43 be better if it had a global jobserver? OTOH, there are so many build
44 systems out there which parallelize building, and many of them won't
45 use a make jobserver but roll their own solution. So it looks a bit
46 futile on that side. That's why I've chosen the loadavg-based
47 approach.
48
49 > But indeed, I'd say you're not the target audience for this (but I appreciate
50 > the input).
51
52 Maybe not, I'm usually building in tmpfs (except huge source archives
53 with huge build artifacts), that means, I usually have plenty of RAM,
54 at least enough so it doesn't become the limiting factor.
55
56 But then again, what is the target audience? This proposal looks like
57 it tries to predict the future, and that's probably never going to
58 work right. Looking at the Github issue linked initially in the
59 thread, it looks like I /might/ be the target audience for packages
60 like qtwebkit because I'm building in tmpfs. The loadavg limiter does
61 quite well here unless a second huge ebuild becomes unpacked and built
62 in the tmpfs, at which point the system struggles to keep up and
63 starves from IO thrashing just to OOM portage a few moments later.
64 That's of course not due to the build jobs itself then, it's purely a
65 memory limitation. But for that reason I have configuration to build
66 such packages outside of tmpfs: While they usually work fine when
67 building just that package alone, it fails the very moment two of such
68 packages are built in parallel.
69
70 Maybe portage needs a job server that dynamically bumps the job
71 counter up or down based on current memory usage? Or "make" itself
72 could be patched to take that into account? But that's probably the
73 whole idea of the loadavg limiter. So I'd propose to at least mention
74 that in the documentation and examples, it seems to only be little
75 known.
76
77 Then again, if we run in a memory constrained system, it may be better
78 to parallelize ebuilds instead of build jobs to better make use of
79 combining light and heavy ebuild phases into the same time period.
80
81 Also, I'm not sure if 2 GB per job is the full picture - no matter if
82 that number is correct or isn't... Because usually the link phase of
83 packages like Chrome is the real RAM burner even with sane "jobs"
84 parameters. I've seen people failing to install these packages because
85 they didn't turn on swap, and then during the link phase, the compiler
86 took so much memory that it either froze the system for half an hour,
87 or OOMed. And at that stage, there's usually just this single compiler
88 process running (and maybe some small ones which almost use no memory
89 relative to that). And that doesn't get better with modern compilers
90 doing all sorts of global optimization stuff like LTO.
91
92 So maybe something like this could work (excluding the link phase):
93
94 If there's potentially running just one ebuild at a time (i.e. your
95 merge list has just one package), the effects of MAKEOPTS is quite
96 predictable. But if we potentially run more, we could carefully reduce
97 the number of jobs in MAKEOPTS before applying additional RAM
98 heuristics. And those heuristics probably should take the combination
99 of both emerge jobs and make jobs into account because potentially
100 that multiplies (unless 692576 is implemented).
101
102 Compiler and link flags may also be needed to take into account.
103
104 And maybe portage should take care of optionally serializing huge
105 packages and never build/unpack them at the same time. This would be a
106 huge winner for me so I would not have to manually configure things...
107 Something like PORTAGE_SERIALIZE_CONSTRAINED="1" to build at most one
108 package that has some RAM/storage warning vars in the ebuild. But
109 that's probably a different topic as it doesn't exactly target the
110 problem discussed here - and I'm also aware of this problem unlike the
111 target audience.
112
113
114 Regards,
115 Kai