On Fri, Jun 01, 2012 at 06:41:22PM -0400, Mike Frysinger wrote:
> # @FUNCTION: multijob_post_fork
> # @DESCRIPTION:
> # You must call this in the parent process after forking a child process.
> # If the parallel limit has been hit, it will wait for one to finish and
> # return the child's exit status.
> multijob_post_fork() {
> [[ $# -eq 0 ]] || die "${FUNCNAME} takes no arguments"
>
> : $(( ++mj_num_jobs ))
> if [[ ${mj_num_jobs} -ge ${mj_max_jobs} ]] ; then
> multijob_finish_one
> fi
> return $?
> }
Minor note; the design of this (fork then check), means when a job
finishes, we'll not be ready with more work. This implicitly means
that given a fast job identification step (main thread), and a slower
job execution (what's backgrounded), we'll not breach #core of
parallelism, nor will we achieve that level either (meaning
potentially some idle cycles left on the floor).
Realistically, the main thread (what invokes post_fork) is *likely*,
(if the consumer isn't fricking retarded) to be doing minor work-
mostly just poking about figuring out what the next task/arguments
are to submit to the pool. That work isn't likely to be a full core
worth of work, else as I said, the consumer is being a retard.
The original form of this was designed around the assumption that the
main thread was light, and the backgrounded jobs weren't, thus it
basically did the equivalent of make -j<cores>+1, allowing #cores
background jobs running, while allowing the main thread to continue on
and get the next job ready, once it had that ready, it would block
waiting for a slot to open, then immediately submit the job once it
had done a reclaim.
On the surface of it, it's a minor difference, but having the next
job immediately ready to fire makes it easier to saturate cores.
Unfortunately, that also changes your API a bit; your call.
~harring
|