Gentoo Archives: gentoo-dev

From: Zac Medico <zmedico@g.o>
To: gentoo-dev@l.g.o
Subject: Re: [gentoo-dev] multiprocessing.eclass: doing parallel work in bash
Date: Sun, 03 Jun 2012 06:54:12
In Reply to: Re: [gentoo-dev] multiprocessing.eclass: doing parallel work in bash by Mike Frysinger
On 06/02/2012 10:05 PM, Mike Frysinger wrote:
> On Saturday 02 June 2012 19:59:02 Brian Harring wrote: >> On Fri, Jun 01, 2012 at 06:41:22PM -0400, Mike Frysinger wrote: >>> # @FUNCTION: multijob_post_fork >>> # @DESCRIPTION: >>> # You must call this in the parent process after forking a child process. >>> # If the parallel limit has been hit, it will wait for one to finish and >>> # return the child's exit status. >>> multijob_post_fork() { >>> >>> [[ $# -eq 0 ]] || die "${FUNCNAME} takes no arguments" >>> >>> : $(( ++mj_num_jobs )) >>> >>> if [[ ${mj_num_jobs} -ge ${mj_max_jobs} ]] ; then >>> >>> multijob_finish_one >>> >>> fi >>> return $? >>> >>> } >> >> Minor note; the design of this (fork then check), means when a job >> finishes, we'll not be ready with more work. This implicitly means >> that given a fast job identification step (main thread), and a slower >> job execution (what's backgrounded), we'll not breach #core of >> parallelism, nor will we achieve that level either (meaning >> potentially some idle cycles left on the floor). >> >> Realistically, the main thread (what invokes post_fork) is *likely*, >> (if the consumer isn't fricking retarded) to be doing minor work- >> mostly just poking about figuring out what the next task/arguments >> are to submit to the pool. That work isn't likely to be a full core >> worth of work, else as I said, the consumer is being a retard. >> >> The original form of this was designed around the assumption that the >> main thread was light, and the backgrounded jobs weren't, thus it >> basically did the equivalent of make -j<cores>+1, allowing #cores >> background jobs running, while allowing the main thread to continue on >> and get the next job ready, once it had that ready, it would block >> waiting for a slot to open, then immediately submit the job once it >> had done a reclaim. > > the original code i designed this around had a heavier main thread because it > had series of parallel sections followed by serial followed by parallel where > the serial regions didn't depend on the parallel finishing right away. that > and doing things post meant it was easier to pass up return values because i > didn't have to save $? anywhere ;). > > thinking a bit more, i don't think the two methods are mutually exclusive. > it's easy to have the code support both, but i'm not sure the extended > documentation helps.
Can't you just add a multijob_pre_fork function and do your waiting in there instead of in the multijob_post_fork function? -- Thanks, Zac