Gentoo Archives: gentoo-dev

From: Tobias Klausmann <klausman@g.o>
To: gentoo-dev@l.g.o
Subject: Re: [gentoo-dev] Race condition in Netfilter triggered by glibc 2.9
Date: Thu, 29 Jan 2009 08:47:51
Message-Id: 20090129084748.GB22271@eric.schwarzvogel.de
In Reply to: Re: [gentoo-dev] Race condition in Netfilter triggered by glibc 2.9 by Mike Frysinger
1 Hi!
2
3 On Wed, 28 Jan 2009, Mike Frysinger wrote:
4 > > On the wire between the client and the firewall, this happens:
5 > >
6 > > a packet 1 is sent
7 > > b packet 2 is sent
8 > > c answer 1 is received
9 > > d answer 2 is received
10 > >
11 > > Sometimes d doesn't happen because b is lost in the firewall
12 > > along the way (where the race condition happens).
13 >
14 > does this affect actual userspace behavior ? in other words,
15 > does this lead to lost lookups and errors from the resolver ?
16
17 The most visible effect (and the way we found out about it first)
18 is a 5s hang on ssh connects. Thing is: how long that timeout is
19 is program dependant (whatever they use in select()). A recvfrom()
20 simply hangs. I wrote a simple C program to do what glibc does
21 (simplified for brevity):
22
23 sockfd = socket(AF_INET, SOCK_DGRAM, IPPROTO_IP);
24 connect(sockfd, tgt->ai_addr, tgt->ai_addrlen);
25 sendto(sockfd, payload1, sizeof(payload1), 0, tgt->ai_addr, tgt->ai_addrlen);
26 sendto(sockfd, payload2, sizeof(payload2), 0, tgt->ai_addr, tgt->ai_addrlen);
27 recvfrom(sockfd, buf, sizeof(buf), 0, &addr, &fromlen);
28 recvfrom(sockfd, buf, sizeof(buf), 0, &addr, &fromlen);
29
30 payload1 and 2 are an A and a AAAA request for the same name,
31 respectively. That second recvfrom() hangs indefinitely in the
32 error case. Here's the full program for those interested:
33
34 http://eric.schwarzvogel.de/~klausman/dnstest2.c.txt
35
36 It'd be easy to put in a call to select and make the program
37 timeout as glibc does instead of simply hanging. Note that for an
38 actual test in your environment, you'll probably have to change
39 the payloads and line 44.
40
41 Here's the tcpdump of the error case:
42 09:42:53.614905 IP 192.168.0.2.39355 > 192.168.22.9.53: 64583+[|domain]
43 09:42:53.614920 IP 192.168.0.2.39355 > 192.168.22.9.53: 61812+[|domain]
44 09:42:53.615623 IP 192.168.22.9.53 > 192.168.0.2.39355: 64583[|domain]
45
46 Or, if you prefer tshark:
47
48 0.000000 192.168.0.2 -> 192.168.22.9 DNS Standard query A eric.schwarzvogel.de
49 0.000015 192.168.0.2 -> 192.168.22.9 DNS Standard query AAAA eric.schwarzvogel.de
50 0.000667 192.168.22.9 -> 192.168.0.2 DNS Standard query response A 194.97.4.250
51
52 As you can see, timing on the two queries is very close. glibc
53 usually is in the 20-50 microsecond range on this machine, my
54 little program can get as low as 5 microseconds. "Correct" timing
55 of course depends on a myriad of variables including load on the
56 packetfilter, kernel version there etc etc.
57
58 A "quickfix" would indeed be using two different ports for the
59 queries - but the bug in Netfilter would still be there.
60
61 Regards,
62 Tobias

Replies

Subject Author
Re: [gentoo-dev] Race condition in Netfilter triggered by glibc 2.9 Mike Frysinger <vapier@g.o>