Gentoo Archives: gentoo-dev

From:	Brian Harring <ferringb@×××××.com>
To:	Micha?? G??rny <mgorny@g.o>
Cc:	gentoo-dev@l.g.o
Subject:	Re: [gentoo-dev] multiprocessing.eclass: doing parallel work in bash
Date:	Sat, 02 Jun 2012 23:48:05
Message-Id:	`20120602234726.GB9296@localhost`
In Reply to:	Re: [gentoo-dev] multiprocessing.eclass: doing parallel work in bash by Zac Medico

1	On Sat, Jun 02, 2012 at 03:50:06PM -0700, Zac Medico wrote:
2	> On 06/02/2012 02:31 PM, Micha?? G??rny wrote:
3	> > On Sat, 2 Jun 2012 15:54:03 -0400
4	> > Mike Frysinger <vapier@g.o> wrote:
5	> >
6	> >> # @FUNCTION: redirect_alloc_fd
7	> >> # @USAGE: <var> <file> [redirection]
8	> >> # @DESCRIPTION:
9	> >
10	> > (...and a lot of code)
11	> >
12	> > I may be wrong but wouldn't it be simpler to just stick with a named
13	> > pipe here? Well, at first glance you wouldn't be able to read exactly
14	> > one result at a time but is it actually useful?
15	>
16	> I'm pretty sure that the pipe has remain constantly open in read mode
17	> (which can only be done by assigning it a file descriptor). Otherwise,
18	> there's a race condition that can occur, where a write is lost because
19	> it's written just before the reader closes the pipe.
20
21	There isn't a race; write side, it'll block once it exceeds pipe buf
22	size; read side, bash's read functionality is explicitly byte by byte
23	reads to avoid consuming data it doesn't need.
24
25	That said, Mgorny's suggestion ignores that the the code already is
26	pointed at a fifo. Presume he's suggesting "Just open it everytime
27	you need to fuck with it"... which, sure, 'cept that complicates the
28	read side (either having to find a free fd, open to it, then close
29	it), or abuse cat or $(<) to pull the results and make the reclaim
30	code handle multiple results in a single shot.
31
32	Frankly, don't see the point in doing that. The code isn't that
33	complex frankly, and we need the overhead of this to be minimal-
34	the hand off/reclaim is effectively the bottleneck for scaling.
35
36	If the jobs you've backgrounded are a second a piece, it matters less;
37	if they're quick little bursts of activity, the scaling will be
38	limited by how fast we can blast off/reclaim jobs. Keep in mind that
39	the main process has to go find more work to queue up between the
40	reclaims, thus this matters more than you'd think.
41
42
43	Either way, that limit varies dependent on time required for each job
44	vs # of cores; that said, you run code like this on a 48 core and you
45	see it start becoming an actual bottleneck (which is why I came up
46	with this hacky bash semaphore).
47
48	~harring

Replies

Subject	Author
Re: [gentoo-dev] multiprocessing.eclass: doing parallel work in bash	Zac Medico <zmedico@g.o>

Report Message

Find on MARC Find on Google Groups