Gentoo Archives: gentoo-portage-dev

From: Perry Smith <pedzsan@×××××.com>
To: gentoo-portage-dev@l.g.o
Subject: Re: [gentoo-portage-dev] EbuildProcess logs poll-error to already removed $T (on AIX)
Date: Tue, 29 Mar 2011 03:08:24
Message-Id: C86891E6-5B7B-4642-B830-4D3ED8DC20F8@gmail.com
In Reply to: Re: [gentoo-portage-dev] EbuildProcess logs poll-error to already removed $T (on AIX) by Zac Medico
1 On Mar 28, 2011, at 9:49 PM, Zac Medico wrote:
2
3 > On 03/28/2011 07:01 PM, Perry Smith wrote:
4 >> I did not 100% follow this. In particular, I didn't see how we started talking about pty's. But, since you are, I'll wade in.
5 >>
6 >> When the master side (the side that a daemon opens like telnetd) closes, the slave side gets the same treatment as if a modem hung up on a real tty. This is a SIGHUP *and* any further writes will return EIO (5) and further reads return 0. (All this is assuming CLOCAL is off.)
7 >>
8 >> I would not be surprised if the child process is receiving a SIGHUP if all the process session and controlling tty requirements have been met and the file descriptor is also selectable for POLLHUP and POLLERR. I would peek inside the Python code because perhaps it is testing for POLLERR before it is testing for POLLHUP. Or, perhaps it is not expecting the POLLERR at all (that is the 16384 value)
9 >
10 > In our case, a subprocess is connected to the slave end of the pty, and
11 > portage reads its output from the master end. With Linux (among other
12 > kernels), after the subprocess closes the slave end, we typically
13 > receive a POLLHUP event or else EIO from a read call. Apparently, Linux
14 > (among other kernels) we never receive a POLLERR event here, but with
15 > AIX we do.
16
17 I want to make sure you typed what you meant to type because it wasn't what I was expecting.
18
19 Does the slave close first or the master? Or, I suppose, more importantly, which side (the master or the slave) is getting the POLLERR?
20
21 >
22 >> This should *not* be AIX specific but is actually POSIX standard.
23 >
24 > When we receive POLLERR, we could try calling waitpid with WNOHANG on
25 > the subprocess. If the process exits successfully at this point, then
26 > it's probably safe to handle this case much like a POLLHUP even. For
27 > this to work reliably, it seems like we will need to retry the waitpid
28 > call in loop with some sleep calls, until the subprocess status becomes
29 > available. If the status doesn't become available after a few seconds,
30 > then we should probably try to kill the subprocess.
31 > --
32 > Thanks,
33 > Zac
34 >

Replies