Gentoo Archives: gentoo-dev

From:	Kai Krakow <hurikhan77+bgo@×××××.com>
To:	gentoo-dev@l.g.o
Cc:	Florian Schmaus <flow@g.o>
Subject:	Re: [gentoo-dev] [PATCH] check-reqs.eclass: clamp MAKEOPTS for memory/RAM usage
Date:	Thu, 06 Jan 2022 01:41:46
Message-Id:	`CAMthOuM5ipzAuDzOQnx_8u0mMcyyb0saX06pkzakiZSOD-hnsg@mail.gmail.com`
In Reply to:	Re: [gentoo-dev] [PATCH] check-reqs.eclass: clamp MAKEOPTS for memory/RAM usage by Sam James

1	Am Mi., 5. Jan. 2022 um 21:21 Uhr schrieb Sam James <sam@g.o>:
2	>
3	>> On 5 Jan 2022, at 19:18, Kai Krakow <kai@××××××××.de> wrote:
4	>
5	>>> Am Mi., 5. Jan. 2022 um 19:22 Uhr schrieb Ulrich Mueller <ulm@g.o>:
6	>
7	> [...]
8	>
9	>>> That applies to all parallel builds though, not only to ebuilds
10	>>> inheriting check-reqs.eclass. By tweaking MAKEOPTS, we're basically
11	>>> telling the user that the --jobs setting in their make.conf is wrong,
12	>>> in the first place.
13	>
14	>
15	>> Well, I'm using a safe combination of jobs and load-average, maybe the
16	>> documentation should be tweaked instead.
17	>
18	>
19	> I think "safe" is doing some heavy lifting here...
20
21	Well, works "safe" for me at least, but you're right.
22
23	>> I'm using
24	>> [...]
25	>
26	>
27	>> The "--jobs" parameter is mostly a safe-guard against "make" or
28	>> "emerge" overshooting the system resources which would happen if
29	>> running unconstrained without "--load-average". The latter parameter
30	>> OTOH tunes the parallel building processes automatically to the
31	>> available resources. If the system starves of memory, thus starts to
32	>> swap, load will increase, and make will reduce the jobs. It works
33	>> pretty well.
34	>
35	>> I've chosen the emerge loadavg limit slightly higher so a heavy ebuild
36	>> won't starve emerge from running configure phases of parallel ebuilds.
37	>
38	>
39	> ... because it's quite hard for this logic to work correctly enough
40	> of the time without jobserver integration (https://bugs.gentoo.org/692576).
41
42	Oh there's a bug report about this... I already wondered: Wouldn't it
43	be better if it had a global jobserver? OTOH, there are so many build
44	systems out there which parallelize building, and many of them won't
45	use a make jobserver but roll their own solution. So it looks a bit
46	futile on that side. That's why I've chosen the loadavg-based
47	approach.
48
49	> But indeed, I'd say you're not the target audience for this (but I appreciate
50	> the input).
51
52	Maybe not, I'm usually building in tmpfs (except huge source archives
53	with huge build artifacts), that means, I usually have plenty of RAM,
54	at least enough so it doesn't become the limiting factor.
55
56	But then again, what is the target audience? This proposal looks like
57	it tries to predict the future, and that's probably never going to
58	work right. Looking at the Github issue linked initially in the
59	thread, it looks like I /might/ be the target audience for packages
60	like qtwebkit because I'm building in tmpfs. The loadavg limiter does
61	quite well here unless a second huge ebuild becomes unpacked and built
62	in the tmpfs, at which point the system struggles to keep up and
63	starves from IO thrashing just to OOM portage a few moments later.
64	That's of course not due to the build jobs itself then, it's purely a
65	memory limitation. But for that reason I have configuration to build
66	such packages outside of tmpfs: While they usually work fine when
67	building just that package alone, it fails the very moment two of such
68	packages are built in parallel.
69
70	Maybe portage needs a job server that dynamically bumps the job
71	counter up or down based on current memory usage? Or "make" itself
72	could be patched to take that into account? But that's probably the
73	whole idea of the loadavg limiter. So I'd propose to at least mention
74	that in the documentation and examples, it seems to only be little
75	known.
76
77	Then again, if we run in a memory constrained system, it may be better
78	to parallelize ebuilds instead of build jobs to better make use of
79	combining light and heavy ebuild phases into the same time period.
80
81	Also, I'm not sure if 2 GB per job is the full picture - no matter if
82	that number is correct or isn't... Because usually the link phase of
83	packages like Chrome is the real RAM burner even with sane "jobs"
84	parameters. I've seen people failing to install these packages because
85	they didn't turn on swap, and then during the link phase, the compiler
86	took so much memory that it either froze the system for half an hour,
87	or OOMed. And at that stage, there's usually just this single compiler
88	process running (and maybe some small ones which almost use no memory
89	relative to that). And that doesn't get better with modern compilers
90	doing all sorts of global optimization stuff like LTO.
91
92	So maybe something like this could work (excluding the link phase):
93
94	If there's potentially running just one ebuild at a time (i.e. your
95	merge list has just one package), the effects of MAKEOPTS is quite
96	predictable. But if we potentially run more, we could carefully reduce
97	the number of jobs in MAKEOPTS before applying additional RAM
98	heuristics. And those heuristics probably should take the combination
99	of both emerge jobs and make jobs into account because potentially
100	that multiplies (unless 692576 is implemented).
101
102	Compiler and link flags may also be needed to take into account.
103
104	And maybe portage should take care of optionally serializing huge
105	packages and never build/unpack them at the same time. This would be a
106	huge winner for me so I would not have to manually configure things...
107	Something like PORTAGE_SERIALIZE_CONSTRAINED="1" to build at most one
108	package that has some RAM/storage warning vars in the ebuild. But
109	that's probably a different topic as it doesn't exactly target the
110	problem discussed here - and I'm also aware of this problem unlike the
111	target audience.
112
113
114	Regards,
115	Kai

Report Message

Find on MARC Find on Google Groups