Gentoo Archives: gentoo-dev

From: Brian Harring <ferringb@×××××.com>
To: Micha?? G??rny <mgorny@g.o>
Cc: gentoo-dev@l.g.o
Subject: Re: [gentoo-dev] multiprocessing.eclass: doing parallel work in bash
Date: Sat, 02 Jun 2012 23:48:05
Message-Id: 20120602234726.GB9296@localhost
In Reply to: Re: [gentoo-dev] multiprocessing.eclass: doing parallel work in bash by Zac Medico
1 On Sat, Jun 02, 2012 at 03:50:06PM -0700, Zac Medico wrote:
2 > On 06/02/2012 02:31 PM, Micha?? G??rny wrote:
3 > > On Sat, 2 Jun 2012 15:54:03 -0400
4 > > Mike Frysinger <vapier@g.o> wrote:
5 > >
6 > >> # @FUNCTION: redirect_alloc_fd
7 > >> # @USAGE: <var> <file> [redirection]
8 > >> # @DESCRIPTION:
9 > >
10 > > (...and a lot of code)
11 > >
12 > > I may be wrong but wouldn't it be simpler to just stick with a named
13 > > pipe here? Well, at first glance you wouldn't be able to read exactly
14 > > one result at a time but is it actually useful?
15 >
16 > I'm pretty sure that the pipe has remain constantly open in read mode
17 > (which can only be done by assigning it a file descriptor). Otherwise,
18 > there's a race condition that can occur, where a write is lost because
19 > it's written just before the reader closes the pipe.
20
21 There isn't a race; write side, it'll block once it exceeds pipe buf
22 size; read side, bash's read functionality is explicitly byte by byte
23 reads to avoid consuming data it doesn't need.
24
25 That said, Mgorny's suggestion ignores that the the code already is
26 pointed at a fifo. Presume he's suggesting "Just open it everytime
27 you need to fuck with it"... which, sure, 'cept that complicates the
28 read side (either having to find a free fd, open to it, then close
29 it), or abuse cat or $(<) to pull the results and make the reclaim
30 code handle multiple results in a single shot.
31
32 Frankly, don't see the point in doing that. The code isn't that
33 complex frankly, and we *need* the overhead of this to be minimal-
34 the hand off/reclaim is effectively the bottleneck for scaling.
35
36 If the jobs you've backgrounded are a second a piece, it matters less;
37 if they're quick little bursts of activity, the scaling *will* be
38 limited by how fast we can blast off/reclaim jobs. Keep in mind that
39 the main process has to go find more work to queue up between the
40 reclaims, thus this matters more than you'd think.
41
42
43 Either way, that limit varies dependent on time required for each job
44 vs # of cores; that said, you run code like this on a 48 core and you
45 see it start becoming an actual bottleneck (which is why I came up
46 with this hacky bash semaphore).
47
48 ~harring

Replies