1 |
Mike Frysinger <vapier@g.o> posted |
2 |
200901282125.52845.vapier@g.o, excerpted below, on Wed, 28 Jan |
3 |
2009 21:25:50 -0500: |
4 |
|
5 |
>> On the wire between the client and the firewall, this happens: |
6 |
>> |
7 |
>> a packet 1 is sent |
8 |
>> b packet 2 is sent |
9 |
>> c answer 1 is received |
10 |
>> d answer 2 is received |
11 |
>> |
12 |
>> Sometimes d doesn't happen because b is lost in the firewall along the |
13 |
>> way (where the race condition happens). |
14 |
> |
15 |
> does this affect actual userspace behavior ? in other words, does this |
16 |
> lead to lost lookups and errors from the resolver ? |
17 |
|
18 |
Some of this is beyond my comprehension level, but I've seen interesting |
19 |
lookup behavior that is at minimum, rather nicely coincidental. |
20 |
|
21 |
Specifically, from my machine (running a local caching bind, with |
22 |
netfilter on both the machine itself and on my OpenWRT based router), |
23 |
doing host lookups on second level domains (cox.com in my case) with MX |
24 |
entries works fine, while lookups on third level domains unlikely to have |
25 |
MX entries (www.cox.com) return the A record right away, then timeout on |
26 |
the MX entry. AFAIK this is fairly new behavior, apparently quite |
27 |
coincident with my installation of glibc-2.9 (_p20081201_r1, currently), |
28 |
as IIRC, it formerly returned fine, without waiting for the timeout. |
29 |
|
30 |
dig -tMX has the same behavior, while a simple dig (A record only) does |
31 |
not. |
32 |
|
33 |
I stumbled across this while investigating after someone (running another |
34 |
distribution, no local DNS server) on the local Cox Unix newsgroup |
35 |
complained about the response time to www.cox.com. We traced it down to |
36 |
long resolve times and checking them I noted this issue. I initially |
37 |
chalked it up to DNS weirdness on their part and that may indeed be part |
38 |
or all of it, but reading this, it sure looks coincidentally similar and |
39 |
the timing seems right, at least here (I've no idea what his glibc |
40 |
version is or whether he's running netfilter based firewalls either on |
41 |
his machine or router, I asked, but don't have a reply yet). |
42 |
|
43 |
I have not noted any particular delays other than with host/dig -tMX |
44 |
myself, but I suspect that may be because I'm running a local bind and it |
45 |
mitigates the issue under normal operating conditions. |
46 |
|
47 |
As I said, it's enough above my head to have no real idea whether this is |
48 |
connected or not, but it sure seems coincidental if not. I'm posting |
49 |
because it seems it might help answer the "Does this affect actual |
50 |
userspace behavior?" bit. |
51 |
|
52 |
I can't help feeling a bit uncomfortable with the discussion here as it's |
53 |
too much like a normally discouraged bug discussion on the main dev |
54 |
list. So if people want to take the discussion to a bug, post the bug |
55 |
link and I'll be happy to CC myself. =:^) |
56 |
|
57 |
-- |
58 |
Duncan - List replies preferred. No HTML msgs. |
59 |
"Every nonfree program has a lord, a master -- |
60 |
and if you use the program, he is your master." Richard Stallman |