1 |
On Fri, Jun 01, 2012 at 06:41:22PM -0400, Mike Frysinger wrote: |
2 |
> regenerating autotools in packages that have a lot of AC_CONFIG_SUBDIRS is |
3 |
> really slow due to the serialization of all the dirs (which really isn't |
4 |
> required). so i took some code that i merged into portage semi-recently |
5 |
> (which is based on work by Brian, although i'm not sure he wants to admit it) |
6 |
|
7 |
I've come up with worse things in the name of speed (see the |
8 |
daemonized ebuild processor...) ;) |
9 |
|
10 |
> and put it into a new multiprocessing.eclass. this way people can generically |
11 |
> utilize this in their own eclasses/ebuilds. |
12 |
> |
13 |
> it doesn't currently support nesting. not sure if i should fix that. |
14 |
> |
15 |
> i'll follow up with an example of parallelizing of eautoreconf. for |
16 |
> mail-filter/maildrop on my 4 core system, it cuts the time needed to run from |
17 |
> ~2.5 min to ~1 min. |
18 |
|
19 |
My main concern here is cleanup during uncontrolled shutdown; if the |
20 |
backgrounded job has hung itself for some reason, the job *will* just |
21 |
sit; I'm not aware of any of the PMs doing process tree killing, or |
22 |
cgroups containment; in my copious free time I'm planning on adding a |
23 |
'cjobs' tool for others, and adding cgroups awareness into pkgcore; |
24 |
that said, none of 'em do this *now*, thus my concern. |
25 |
|
26 |
|
27 |
|
28 |
> -mike |
29 |
> |
30 |
> # Copyright 1999-2012 Gentoo Foundation |
31 |
> # Distributed under the terms of the GNU General Public License v2 |
32 |
> # $Header: $ |
33 |
> |
34 |
> # @ECLASS: multiprocessing.eclass |
35 |
> # @MAINTAINER: |
36 |
> # base-system@g.o |
37 |
> # @AUTHORS: |
38 |
> # Brian Harring <ferringb@g.o> |
39 |
> # Mike Frysinger <vapier@g.o> |
40 |
> # @BLURB: parallelization with bash (wtf?) |
41 |
> # @DESCRIPTION: |
42 |
> # The multiprocessing eclass contains a suite of functions that allow ebuilds |
43 |
> # to quickly run things in parallel using shell code. |
44 |
> |
45 |
> if [[ ${___ECLASS_ONCE_MULTIPROCESSING} != "recur -_+^+_- spank" ]] ; then |
46 |
> ___ECLASS_ONCE_MULTIPROCESSING="recur -_+^+_- spank" |
47 |
> |
48 |
> # @FUNCTION: makeopts_jobs |
49 |
> # @USAGE: [${MAKEOPTS}] |
50 |
> # @DESCRIPTION: |
51 |
> # Searches the arguments (defaults to ${MAKEOPTS}) and extracts the jobs number |
52 |
> # specified therein. Useful for running non-make tools in parallel too. |
53 |
> # i.e. if the user has MAKEOPTS=-j9, this will show "9". |
54 |
> # We can't return the number as bash normalizes it to [0, 255]. If the flags |
55 |
> # haven't specified a -j flag, then "1" is shown as that is the default `make` |
56 |
> # uses. Since there's no way to represent infinity, we return 999 if the user |
57 |
> # has -j without a number. |
58 |
> makeopts_jobs() { |
59 |
> [[ $# -eq 0 ]] && set -- ${MAKEOPTS} |
60 |
> # This assumes the first .* will be more greedy than the second .* |
61 |
> # since POSIX doesn't specify a non-greedy match (i.e. ".*?"). |
62 |
> local jobs=$(echo " $* " | sed -r -n \ |
63 |
> -e 's:.*[[:space:]](-j|--jobs[=[:space:]])[[:space:]]*([0-9]+).*:\2:p' \ |
64 |
> -e 's:.*[[:space:]](-j|--jobs)[[:space:]].*:999:p') |
65 |
> echo ${jobs:-1} |
66 |
> } |
67 |
|
68 |
This function belongs in eutils, or somewhere similar- pretty sure |
69 |
we've got variants of this in multiple spots. I'd prefer a single |
70 |
point to change if/when we add a way to pass parallelism down into the |
71 |
env via EAPI. |
72 |
|
73 |
|
74 |
> # @FUNCTION: multijob_init |
75 |
> # @USAGE: [${MAKEOPTS}] |
76 |
> # @DESCRIPTION: |
77 |
> # Setup the environment for executing things in parallel. |
78 |
> # You must call this before any other multijob function. |
79 |
> multijob_init() { |
80 |
> # Setup a pipe for children to write their pids to when they finish. |
81 |
> mj_control_pipe="${T}/multijob.pipe" |
82 |
> mkfifo "${mj_control_pipe}" |
83 |
> exec {mj_control_fd}<>${mj_control_pipe} |
84 |
> rm -f "${mj_control_pipe}" |
85 |
|
86 |
Nice; hadn't thought to wipe the pipe on the way out. |
87 |
|
88 |
> |
89 |
> # See how many children we can fork based on the user's settings. |
90 |
> mj_max_jobs=$(makeopts_jobs "$@") |
91 |
> mj_num_jobs=0 |
92 |
> } |
93 |
> |
94 |
> # @FUNCTION: multijob_child_init |
95 |
> # @DESCRIPTION: |
96 |
> # You must call this first in the forked child process. |
97 |
> multijob_child_init() { |
98 |
> [[ $# -eq 0 ]] || die "${FUNCNAME} takes no arguments" |
99 |
> |
100 |
> trap 'echo ${BASHPID} $? >&'${mj_control_fd} EXIT |
101 |
> trap 'exit 1' INT TERM |
102 |
> } |
103 |
|
104 |
Kind of dislike this form since it means consuming code has to be |
105 |
aware of, and do the () & trick. |
106 |
|
107 |
A helper function, something like |
108 |
multijob_child_job() { |
109 |
( |
110 |
multijob_child_init |
111 |
"$@" |
112 |
) & |
113 |
multijob_post_fork || die "game over man, game over" |
114 |
} |
115 |
|
116 |
Doing so, would conver your eautoreconf from: |
117 |
for x in $(autotools_check_macro_val AC_CONFIG_SUBDIRS) ; do |
118 |
if [[ -d ${x} ]] ; then |
119 |
pushd "${x}" >/dev/null |
120 |
( |
121 |
multijob_child_init |
122 |
AT_NOELIBTOOLIZE="yes" eautoreconf |
123 |
) & |
124 |
multijob_post_fork || die |
125 |
popd >/dev/null |
126 |
fi |
127 |
done |
128 |
|
129 |
To: |
130 |
for x in $(autotools_check_macro_val AC_CONFIG_SUBDIRS) ; do |
131 |
if [[ -d ${x} ]]; then |
132 |
pushd "${x}" > /dev/null |
133 |
AT_NOELIBTOOLIZE="yes" multijob_child_job eautoreconf |
134 |
popd |
135 |
fi |
136 |
done |
137 |
|
138 |
|
139 |
Note, if we used an eval in multijob_child_job, the pushd/popd could |
140 |
be folded in. Debatable. |
141 |
|
142 |
|
143 |
|
144 |
> # @FUNCTION: multijob_post_fork |
145 |
> # @DESCRIPTION: |
146 |
> # You must call this in the parent process after forking a child process. |
147 |
> # If the parallel limit has been hit, it will wait for one to finish and |
148 |
> # return the child's exit status. |
149 |
> multijob_post_fork() { |
150 |
> [[ $# -eq 0 ]] || die "${FUNCNAME} takes no arguments" |
151 |
> |
152 |
> : $(( ++mj_num_jobs )) |
153 |
> if [[ ${mj_num_jobs} -ge ${mj_max_jobs} ]] ; then |
154 |
> multijob_finish_one |
155 |
> fi |
156 |
> return $? |
157 |
> } |
158 |
> |
159 |
> # @FUNCTION: multijob_finish_one |
160 |
> # @DESCRIPTION: |
161 |
> # Wait for a single process to exit and return its exit code. |
162 |
> multijob_finish_one() { |
163 |
> [[ $# -eq 0 ]] || die "${FUNCNAME} takes no arguments" |
164 |
> |
165 |
> local pid ret |
166 |
> read -r -u ${mj_control_fd} pid ret |
167 |
|
168 |
Mildly concerned about the failure case here- specifically if the read |
169 |
fails (fd was closed, take your pick). |
170 |
|
171 |
|
172 |
> : $(( --mj_num_jobs )) |
173 |
> return ${ret} |
174 |
> } |
175 |
> |
176 |
> # @FUNCTION: multijob_finish |
177 |
> # @DESCRIPTION: |
178 |
> # Wait for all pending processes to exit and return the bitwise or |
179 |
> # of all their exit codes. |
180 |
> multijob_finish() { |
181 |
> [[ $# -eq 0 ]] || die "${FUNCNAME} takes no arguments" |
182 |
|
183 |
Tend to think this should do cleanup, then die if someone invoked the |
184 |
api incorrectly; I'd rather see the children reaped before this blows |
185 |
up. |
186 |
|
187 |
> local ret=0 |
188 |
> while [[ ${mj_num_jobs} -gt 0 ]] ; do |
189 |
> multijob_finish_one |
190 |
> : $(( ret |= $? )) |
191 |
> done |
192 |
> # Let bash clean up its internal child tracking state. |
193 |
> wait |
194 |
> return ${ret} |
195 |
> } |
196 |
> |
197 |
> fi |
198 |
|
199 |
|
200 |
~harring |