Gentoo Archives: gentoo-alt

From: Greg Turner <gmt@×××××.us>
To: gentoo-alt@l.g.o
Subject: Re: [gentoo-alt] multilib / multijob / parallel build problem
Date: Wed, 09 Oct 2013 19:06:03
Message-Id: CA+VB3NR7fnkR7PZqcmkRqqDXnmfnqdRgpzBNGf5n+UsFyY1d8w@mail.gmail.com
In Reply to: Re: [gentoo-alt] multilib / multijob / parallel build problem by Alan Hourihane
1 Well, like I said I'm quite plausibly confused, as it's been a while
2 since I paid any mind to this.
3
4 But I do recall that there was some tricky race that could crop up if
5 the readers and the writers didn't synchronize "just so". IIRC
6 zmedico posted a test-case for it to one of these lists, probably -dev
7 or this one.
8
9 I think it was some issue along the lines that, under certain
10 circumstances, if the writer started stuffing data in before the
11 reader started listening, the first message(s?) down the pipe would be
12 lost.
13
14 Even more vaguely, I seem to recall that somehow, the act of bash
15 creating, specifically, a bidirectional pipe prevented the problem --
16 I think, some magic happened with fork() and the cloned bidi pipe
17 handle where the pipe was considered to "have a reader" right away and
18 therefore (if the pipe buffer filled up?) writers would block until
19 the reader got around to reading... something like that.
20
21 Anyhow I could definitely be completely wrong, and even if I'm not,
22 it's quite possible that by cloning the reader and writer pipe handles
23 you'll achieve the same result.
24
25 A way to test for this would be to create a simple multijob
26 test-ebuild, where the first job returns a failure code, but all the
27 others (preferably a crapload of super-quick ones, I guess) succeed,
28 and modify multiprocessing.eclass, by inserting a 30-second sleep in
29 multijob_finish_one just before the "read" statement. If
30 multijob_finish in your test e-build correctly returned a nonzero
31 result, then you could be sure that the first message was not ignored
32 despite the delayed read.
33
34 (This is assuming it's the first write that gets dropped... I could be
35 wrong about that, i.e., it could be the second, or the first one after
36 some buffer fills up, or I could just be spewing complete nonsense :)
37 )
38
39 On Mon, Oct 7, 2013 at 12:36 AM, Alan Hourihane <alanh@×××××××××××.uk> wrote:
40 > On 10/07/13 01:11, Greg Turner wrote:
41 >>
42 >> Alan wrote:
43 >>>
44 >>> - redirect_alloc_fd mj_control_fd "${pipe}"
45 >>> + redirect_alloc_fd mj_write_fd "${pipe}"
46 >>> + redirect_alloc_fd mj_read_fd "${pipe}"
47 >>
48 >> The one thing I'd be looking to verify is that there's no race where
49 >> where the reader must block before the writer starts producing data.
50 >>
51 >> You could test that hypothesis by sleeping for a while right before
52 >> the reader starts listening.
53 >>
54 >> If that were the case -- I kinda suspect it is -- then unfortunately,
55 >> I think the writer has to spin, waiting for the reader to block... not
56 >> a pretty situation, so let's hope I'm confused.
57 >>
58 >
59 > I don't understand. There is no race, the above two commands just get two
60 > different file descriptors for the read end and the write end. The FIFO is
61 > opened in non-blocking IO mode anyway, as the FIFO is still opened in
62 > read/write mode to get this behaviour. So the actual read from one fd, and
63 > the write from another fd behaves no differently to reading and writing from
64 > the same fd.
65 >
66 > Alan.
67 >