Gentoo Logo
Gentoo Spaceship




Note: Due to technical difficulties, the Archives are currently not up to date. GMANE provides an alternative service for most mailing lists.
c.f. bug 424647
List Archive: gentoo-dev
Navigation:
Lists: gentoo-dev: < Prev By Thread Next > < Prev By Date Next >
Headers:
To: gentoo-dev@g.o
From: Brian Harring <ferringb@...>
Subject: Re: multiprocessing.eclass: doing parallel work in bash
Date: Fri, 1 Jun 2012 21:11:19 -0700
On Fri, Jun 01, 2012 at 06:41:22PM -0400, Mike Frysinger wrote:
> regenerating autotools in packages that have a lot of AC_CONFIG_SUBDIRS is
> really slow due to the serialization of all the dirs (which really isn't
> required).  so i took some code that i merged into portage semi-recently
> (which is based on work by Brian, although i'm not sure he wants to admit it)

I've come up with worse things in the name of speed (see the 
daemonized ebuild processor...) ;)

> and put it into a new multiprocessing.eclass.  this way people can generically
> utilize this in their own eclasses/ebuilds.
> 
> it doesn't currently support nesting.  not sure if i should fix that.
> 
> i'll follow up with an example of parallelizing of eautoreconf.  for
> mail-filter/maildrop on my 4 core system, it cuts the time needed to run from
> ~2.5 min to ~1 min.

My main concern here is cleanup during uncontrolled shutdown; if the 
backgrounded job has hung itself for some reason, the job *will* just 
sit; I'm not aware of any of the PMs doing process tree killing, or 
cgroups containment; in my copious free time I'm planning on adding a 
'cjobs' tool for others, and adding cgroups awareness into pkgcore; 
that said, none of 'em do this *now*, thus my concern.



> -mike
> 
> # Copyright 1999-2012 Gentoo Foundation
> # Distributed under the terms of the GNU General Public License v2
> # $Header: $
> 
> # @ECLASS: multiprocessing.eclass
> # @MAINTAINER:
> # base-system@g.o
> # @AUTHORS:
> # Brian Harring <ferringb@g.o>
> # Mike Frysinger <vapier@g.o>
> # @BLURB: parallelization with bash (wtf?)
> # @DESCRIPTION:
> # The multiprocessing eclass contains a suite of functions that allow ebuilds
> # to quickly run things in parallel using shell code.
> 
> if [[ ${___ECLASS_ONCE_MULTIPROCESSING} != "recur -_+^+_- spank" ]] ; then
> ___ECLASS_ONCE_MULTIPROCESSING="recur -_+^+_- spank"
> 
> # @FUNCTION: makeopts_jobs
> # @USAGE: [${MAKEOPTS}]
> # @DESCRIPTION:
> # Searches the arguments (defaults to ${MAKEOPTS}) and extracts the jobs number
> # specified therein.  Useful for running non-make tools in parallel too.
> # i.e. if the user has MAKEOPTS=-j9, this will show "9".
> # We can't return the number as bash normalizes it to [0, 255].  If the flags
> # haven't specified a -j flag, then "1" is shown as that is the default `make`
> # uses.  Since there's no way to represent infinity, we return 999 if the user
> # has -j without a number.
> makeopts_jobs() {
> 	[[ $# -eq 0 ]] && set -- ${MAKEOPTS}
> 	# This assumes the first .* will be more greedy than the second .*
> 	# since POSIX doesn't specify a non-greedy match (i.e. ".*?").
> 	local jobs=$(echo " $* " | sed -r -n \
> 		-e 's:.*[[:space:]](-j|--jobs[=[:space:]])[[:space:]]*([0-9]+).*:\2:p' \
> 		-e 's:.*[[:space:]](-j|--jobs)[[:space:]].*:999:p')
> 	echo ${jobs:-1}
> }

This function belongs in eutils, or somewhere similar- pretty sure 
we've got variants of this in multiple spots.  I'd prefer a single 
point to change if/when we add a way to pass parallelism down into the 
env via EAPI.


> # @FUNCTION: multijob_init
> # @USAGE: [${MAKEOPTS}]
> # @DESCRIPTION:
> # Setup the environment for executing things in parallel.
> # You must call this before any other multijob function.
> multijob_init() {
> 	# Setup a pipe for children to write their pids to when they finish.
> 	mj_control_pipe="${T}/multijob.pipe"
> 	mkfifo "${mj_control_pipe}"
> 	exec {mj_control_fd}<>${mj_control_pipe}
> 	rm -f "${mj_control_pipe}"

Nice; hadn't thought to wipe the pipe on the way out.

> 
> 	# See how many children we can fork based on the user's settings.
> 	mj_max_jobs=$(makeopts_jobs "$@")
> 	mj_num_jobs=0
> }
> 
> # @FUNCTION: multijob_child_init
> # @DESCRIPTION:
> # You must call this first in the forked child process.
> multijob_child_init() {
> 	[[ $# -eq 0 ]] || die "${FUNCNAME} takes no arguments"
> 
> 	trap 'echo ${BASHPID} $? >&'${mj_control_fd} EXIT
> 	trap 'exit 1' INT TERM
> }

Kind of dislike this form since it means consuming code has to be 
aware of, and do the () & trick.

A helper function, something like
multijob_child_job() {
  (
  multijob_child_init
  "$@"
  ) &
  multijob_post_fork || die "game over man, game over"
}

Doing so, would conver your eautoreconf from:
for x in $(autotools_check_macro_val AC_CONFIG_SUBDIRS) ; do
  if [[ -d ${x} ]] ; then
    pushd "${x}" >/dev/null
    (
    multijob_child_init
    AT_NOELIBTOOLIZE="yes" eautoreconf
    ) &
    multijob_post_fork || die
    popd >/dev/null
  fi
done

To:
for x in $(autotools_check_macro_val AC_CONFIG_SUBDIRS) ; do
  if [[ -d ${x} ]]; then
    pushd "${x}" > /dev/null
    AT_NOELIBTOOLIZE="yes" multijob_child_job eautoreconf
    popd
  fi
done


Note, if we used an eval in multijob_child_job, the pushd/popd could 
be folded in.  Debatable.



> # @FUNCTION: multijob_post_fork
> # @DESCRIPTION:
> # You must call this in the parent process after forking a child process.
> # If the parallel limit has been hit, it will wait for one to finish and
> # return the child's exit status.
> multijob_post_fork() {
> 	[[ $# -eq 0 ]] || die "${FUNCNAME} takes no arguments"
> 
> 	: $(( ++mj_num_jobs ))
> 	if [[ ${mj_num_jobs} -ge ${mj_max_jobs} ]] ; then
> 		multijob_finish_one
> 	fi
> 	return $?
> }
> 
> # @FUNCTION: multijob_finish_one
> # @DESCRIPTION:
> # Wait for a single process to exit and return its exit code.
> multijob_finish_one() {
> 	[[ $# -eq 0 ]] || die "${FUNCNAME} takes no arguments"
> 
> 	local pid ret
> 	read -r -u ${mj_control_fd} pid ret

Mildly concerned about the failure case here- specifically if the read 
fails (fd was closed, take your pick).


> 	: $(( --mj_num_jobs ))
> 	return ${ret}
> }
> 
> # @FUNCTION: multijob_finish
> # @DESCRIPTION:
> # Wait for all pending processes to exit and return the bitwise or
> # of all their exit codes.
> multijob_finish() {
> 	[[ $# -eq 0 ]] || die "${FUNCNAME} takes no arguments"

Tend to think this should do cleanup, then die if someone invoked the 
api incorrectly; I'd rather see the children reaped before this blows 
up.

> 	local ret=0
> 	while [[ ${mj_num_jobs} -gt 0 ]] ; do
> 		multijob_finish_one
> 		: $(( ret |= $? ))
> 	done
> 	# Let bash clean up its internal child tracking state.
> 	wait
> 	return ${ret}
> }
> 
> fi


~harring


Replies:
Re: multiprocessing.eclass: doing parallel work in bash
-- Mike Frysinger
References:
multiprocessing.eclass: doing parallel work in bash
-- Mike Frysinger
Navigation:
Lists: gentoo-dev: < Prev By Thread Next > < Prev By Date Next >
Previous by thread:
Re: multiprocessing.eclass: doing parallel work in bash
Next by thread:
Re: multiprocessing.eclass: doing parallel work in bash
Previous by date:
Re: Re: Portage Git migration - clean cut or git-cvsserver
Next by date:
Re: Re: enhancement for doicon/newicon in eutils.eclass


Updated Jun 29, 2012

Summary: Archive of the gentoo-dev mailing list.

Donate to support our development efforts.

Copyright 2001-2013 Gentoo Foundation, Inc. Questions, Comments? Contact us.