Gentoo Logo
Gentoo Spaceship




Note: Due to technical difficulties, the Archives are currently not up to date. GMANE provides an alternative service for most mailing lists.
c.f. bug 424647
List Archive: gentoo-dev
Navigation:
Lists: gentoo-dev: < Prev By Thread Next > < Prev By Date Next >
Headers:
To: gentoo-dev@g.o
From: Tobias Klausmann <klausman@g.o>
Subject: Re: Race condition in Netfilter triggered by glibc 2.9
Date: Thu, 29 Jan 2009 09:47:48 +0100
Hi! 

On Wed, 28 Jan 2009, Mike Frysinger wrote:
> > On the wire between the client and the firewall, this happens:
> >
> > a packet 1 is sent
> > b packet 2 is sent
> > c answer 1 is received
> > d answer 2 is received
> >
> > Sometimes d doesn't happen because b is lost in the firewall
> > along the way (where the race condition happens).
> 
> does this affect actual userspace behavior ?  in other words,
> does this lead to lost lookups and errors from the resolver ?

The most visible effect (and the way we found out about it first)
is a 5s hang on ssh connects. Thing is: how long that timeout is
is program dependant (whatever they use in select()). A recvfrom() 
simply hangs. I wrote a simple C program to do what glibc does
(simplified for brevity):

sockfd = socket(AF_INET, SOCK_DGRAM, IPPROTO_IP);
connect(sockfd, tgt->ai_addr, tgt->ai_addrlen);
sendto(sockfd, payload1, sizeof(payload1), 0, tgt->ai_addr, tgt->ai_addrlen); 
sendto(sockfd, payload2, sizeof(payload2), 0, tgt->ai_addr, tgt->ai_addrlen); 
recvfrom(sockfd, buf, sizeof(buf), 0, &addr, &fromlen);
recvfrom(sockfd, buf, sizeof(buf), 0, &addr, &fromlen);

payload1 and 2 are an A and a AAAA request for the same name,
respectively. That second recvfrom() hangs indefinitely in the
error case. Here's the full program for those interested:

http://eric.schwarzvogel.de/~klausman/dnstest2.c.txt

It'd be easy to put in a call to select and make the program
timeout as glibc does instead of simply hanging. Note that for an
actual test in your environment, you'll probably have to change
the payloads and line 44.

Here's the tcpdump of the error case:
09:42:53.614905 IP 192.168.0.2.39355 > 192.168.22.9.53: 64583+[|domain]
09:42:53.614920 IP 192.168.0.2.39355 > 192.168.22.9.53: 61812+[|domain]
09:42:53.615623 IP 192.168.22.9.53 > 192.168.0.2.39355: 64583[|domain]

Or, if you prefer tshark:

0.000000 192.168.0.2 -> 192.168.22.9  DNS Standard query A eric.schwarzvogel.de
0.000015 192.168.0.2 -> 192.168.22.9  DNS Standard query AAAA eric.schwarzvogel.de
0.000667  192.168.22.9 -> 192.168.0.2 DNS Standard query response A 194.97.4.250

As you can see, timing on the two queries is very close. glibc
usually is in the 20-50 microsecond range on this machine, my
little program can get as low as 5 microseconds. "Correct" timing
of course depends on a myriad of variables including load on the
packetfilter, kernel version there etc etc.

A "quickfix" would indeed be using two different ports for the
queries - but the bug in Netfilter would still be there.

Regards,
Tobias



Replies:
Re: Race condition in Netfilter triggered by glibc 2.9
-- Mike Frysinger
References:
[no subject]
-- Tobias Klausmann
Re:
-- Peter Alfredsen
Re: Race condition in Netfilter triggered by glibc 2.9
-- Tobias Klausmann
Re: Race condition in Netfilter triggered by glibc 2.9
-- Mike Frysinger
Navigation:
Lists: gentoo-dev: < Prev By Thread Next > < Prev By Date Next >
Previous by thread:
Re: Race condition in Netfilter triggered by glibc 2.9
Next by thread:
Re: Race condition in Netfilter triggered by glibc 2.9
Previous by date:
Re: Race condition in Netfilter triggered by glibc 2.9
Next by date:
Re: QEMU Sick!


Updated Jun 17, 2009

Summary: Archive of the gentoo-dev mailing list.

Donate to support our development efforts.

Copyright 2001-2013 Gentoo Foundation, Inc. Questions, Comments? Contact us.