Gentoo Archives: gentoo-dev

From: Mike Frysinger <vapier@g.o>
To: gentoo-dev@l.g.o
Subject: Re: [gentoo-dev] multiprocessing.eclass: doing parallel work in bash
Date: Sun, 03 Jun 2012 05:06:43
In Reply to: Re: [gentoo-dev] multiprocessing.eclass: doing parallel work in bash by Brian Harring
On Saturday 02 June 2012 19:59:02 Brian Harring wrote:
> On Fri, Jun 01, 2012 at 06:41:22PM -0400, Mike Frysinger wrote: > > # @FUNCTION: multijob_post_fork > > # @DESCRIPTION: > > # You must call this in the parent process after forking a child process. > > # If the parallel limit has been hit, it will wait for one to finish and > > # return the child's exit status. > > multijob_post_fork() { > > > > [[ $# -eq 0 ]] || die "${FUNCNAME} takes no arguments" > > > > : $(( ++mj_num_jobs )) > > > > if [[ ${mj_num_jobs} -ge ${mj_max_jobs} ]] ; then > > > > multijob_finish_one > > > > fi > > return $? > > > > } > > Minor note; the design of this (fork then check), means when a job > finishes, we'll not be ready with more work. This implicitly means > that given a fast job identification step (main thread), and a slower > job execution (what's backgrounded), we'll not breach #core of > parallelism, nor will we achieve that level either (meaning > potentially some idle cycles left on the floor). > > Realistically, the main thread (what invokes post_fork) is *likely*, > (if the consumer isn't fricking retarded) to be doing minor work- > mostly just poking about figuring out what the next task/arguments > are to submit to the pool. That work isn't likely to be a full core > worth of work, else as I said, the consumer is being a retard. > > The original form of this was designed around the assumption that the > main thread was light, and the backgrounded jobs weren't, thus it > basically did the equivalent of make -j<cores>+1, allowing #cores > background jobs running, while allowing the main thread to continue on > and get the next job ready, once it had that ready, it would block > waiting for a slot to open, then immediately submit the job once it > had done a reclaim.
the original code i designed this around had a heavier main thread because it had series of parallel sections followed by serial followed by parallel where the serial regions didn't depend on the parallel finishing right away. that and doing things post meant it was easier to pass up return values because i didn't have to save $? anywhere ;). thinking a bit more, i don't think the two methods are mutually exclusive. it's easy to have the code support both, but i'm not sure the extended documentation helps. -mike


File name MIME type
signature.asc application/pgp-signature