Gentoo Archives: gentoo-dev

From: Brian Harring <ferringb@×××××.com>
To: Mike Frysinger <vapier@g.o>
Cc: gentoo-dev@l.g.o
Subject: Re: [gentoo-dev] multiprocessing.eclass: doing parallel work in bash
Date: Sat, 02 Jun 2012 23:59:45
Message-Id: 20120602235902.GC9296@localhost
In Reply to: [gentoo-dev] multiprocessing.eclass: doing parallel work in bash by Mike Frysinger
On Fri, Jun 01, 2012 at 06:41:22PM -0400, Mike Frysinger wrote:
> # @FUNCTION: multijob_post_fork > # @DESCRIPTION: > # You must call this in the parent process after forking a child process. > # If the parallel limit has been hit, it will wait for one to finish and > # return the child's exit status. > multijob_post_fork() { > [[ $# -eq 0 ]] || die "${FUNCNAME} takes no arguments" > > : $(( ++mj_num_jobs )) > if [[ ${mj_num_jobs} -ge ${mj_max_jobs} ]] ; then > multijob_finish_one > fi > return $? > }
Minor note; the design of this (fork then check), means when a job finishes, we'll not be ready with more work. This implicitly means that given a fast job identification step (main thread), and a slower job execution (what's backgrounded), we'll not breach #core of parallelism, nor will we achieve that level either (meaning potentially some idle cycles left on the floor). Realistically, the main thread (what invokes post_fork) is *likely*, (if the consumer isn't fricking retarded) to be doing minor work- mostly just poking about figuring out what the next task/arguments are to submit to the pool. That work isn't likely to be a full core worth of work, else as I said, the consumer is being a retard. The original form of this was designed around the assumption that the main thread was light, and the backgrounded jobs weren't, thus it basically did the equivalent of make -j<cores>+1, allowing #cores background jobs running, while allowing the main thread to continue on and get the next job ready, once it had that ready, it would block waiting for a slot to open, then immediately submit the job once it had done a reclaim. On the surface of it, it's a minor difference, but having the next job immediately ready to fire makes it easier to saturate cores. Unfortunately, that also changes your API a bit; your call. ~harring