Gentoo Archives: gentoo-sparc

From: Keith M Wesolowski <wesolows@××××××××.org>
To: Leif Sawyer <lsawyer@×××.com>
Cc: gentoo-sparc@l.g.o, eradicator@g.o
Subject: Re: [gentoo-sparc] 2.6.7 Kernel stability (was: RE: [gentoo-sparc ] ALSA support on s parc)
Date: Fri, 03 Dec 2004 19:32:18
In Reply to: RE: [gentoo-sparc] 2.6.7 Kernel stability (was: RE: [gentoo-sparc ] ALSA support on s parc) by Leif Sawyer
On Fri, Dec 03, 2004 at 10:02:19AM -0900, Leif Sawyer wrote:

> make -j all > > uptime: 09:41:41 up 10m, 2 users, load avg: 200.90, 65.79, 23.42
> Dec 3 09:48:05 VM: killing process apache2
> OOMkiller 'feels' more aggressive in -rc2, but still doesn't have the > instability > that gds-267-r16 has
Your workload is pathological. 100+ simultaneous executions of gcc is sure to exhaust resources on all but the largest boxes. That the silly "oom killer" kills the wrong process and that its behaviour varies from kernel to kernel should not be surprising as it has been discussed to death on lkml. Don't use it. If you need to keep system processes running, disable the oom killer and apply resource limits to users. I question the use of make -j as a stress test. What is the desired behaviour of running this command, other than not to crash the OS? If you haven't specified any resource allocation policy, the OS's only real obligation is not to crash. So what constitutes a "successful" test run? Should the spawned compiler instances fail to allocate memory and bomb? Should the OS kill them? Should it kill the pg leader (make) instead? Or your shell? Or should some other process's allocations fail? Should the OS kill some other process? If so, which one? If it's going to kill something, what signal(s) should it force? If it sends SIGTERM, what should happen to other processes that attempt to sbrk(2) or mmap(2) while it's waiting for the SIGTERM'd process to die? Do you put them to sleep? Do you fail their requests? Do you kill them too (yay, deadlock!)? What if the process needs to allocate memory when dying (more deadlock)? If it sends SIGKILL, what about shm segments it may have allocated; wouldn't leaking those just worsen the problem? Maybe it should just kill all of userland except init and start over. And if it's going to do that, why not just crash? I don't see that Linux has answered any of these questions, and most of them don't need to be asked. Use resource management. If it doesn't work, fix it. The tests you are running are a waste of time. -- Keith M Wesolowski "Site launched. Many things not yet working." --Hector Urtubia -- gentoo-sparc@g.o mailing list