Gentoo Archives: gentoo-dev

From: Duncan <1i5t5.duncan@×××.net>
To: gentoo-dev@l.g.o
Subject: [gentoo-dev] Re: Race condition in Netfilter triggered by glibc 2.9
Date: Thu, 29 Jan 2009 04:05:00
Message-Id: pan.2009.01.29.04.04.31@cox.net
In Reply to: Re: [gentoo-dev] Race condition in Netfilter triggered by glibc 2.9 by Mike Frysinger
1 Mike Frysinger <vapier@g.o> posted
2 200901282125.52845.vapier@g.o, excerpted below, on Wed, 28 Jan
3 2009 21:25:50 -0500:
4
5 >> On the wire between the client and the firewall, this happens:
6 >>
7 >> a packet 1 is sent
8 >> b packet 2 is sent
9 >> c answer 1 is received
10 >> d answer 2 is received
11 >>
12 >> Sometimes d doesn't happen because b is lost in the firewall along the
13 >> way (where the race condition happens).
14 >
15 > does this affect actual userspace behavior ? in other words, does this
16 > lead to lost lookups and errors from the resolver ?
17
18 Some of this is beyond my comprehension level, but I've seen interesting
19 lookup behavior that is at minimum, rather nicely coincidental.
20
21 Specifically, from my machine (running a local caching bind, with
22 netfilter on both the machine itself and on my OpenWRT based router),
23 doing host lookups on second level domains (cox.com in my case) with MX
24 entries works fine, while lookups on third level domains unlikely to have
25 MX entries (www.cox.com) return the A record right away, then timeout on
26 the MX entry. AFAIK this is fairly new behavior, apparently quite
27 coincident with my installation of glibc-2.9 (_p20081201_r1, currently),
28 as IIRC, it formerly returned fine, without waiting for the timeout.
29
30 dig -tMX has the same behavior, while a simple dig (A record only) does
31 not.
32
33 I stumbled across this while investigating after someone (running another
34 distribution, no local DNS server) on the local Cox Unix newsgroup
35 complained about the response time to www.cox.com. We traced it down to
36 long resolve times and checking them I noted this issue. I initially
37 chalked it up to DNS weirdness on their part and that may indeed be part
38 or all of it, but reading this, it sure looks coincidentally similar and
39 the timing seems right, at least here (I've no idea what his glibc
40 version is or whether he's running netfilter based firewalls either on
41 his machine or router, I asked, but don't have a reply yet).
42
43 I have not noted any particular delays other than with host/dig -tMX
44 myself, but I suspect that may be because I'm running a local bind and it
45 mitigates the issue under normal operating conditions.
46
47 As I said, it's enough above my head to have no real idea whether this is
48 connected or not, but it sure seems coincidental if not. I'm posting
49 because it seems it might help answer the "Does this affect actual
50 userspace behavior?" bit.
51
52 I can't help feeling a bit uncomfortable with the discussion here as it's
53 too much like a normally discouraged bug discussion on the main dev
54 list. So if people want to take the discussion to a bug, post the bug
55 link and I'll be happy to CC myself. =:^)
56
57 --
58 Duncan - List replies preferred. No HTML msgs.
59 "Every nonfree program has a lord, a master --
60 and if you use the program, he is your master." Richard Stallman