On Thu, 29 Jan 2009, Mike Frysinger wrote:
> > The most visible effect (and the way we found out about it
> > first) is a 5s hang on ssh connects.
> this is why i turn off dns lookup in all my sshd_config's
> (well, not because of this bug, but because DNS lookup on ssh
> can cause annoying delays). plus, that info is largely
> useless: for the logged attempts from "bad" people, the dns is
> usually screwed up / wrong / unavailable anyways.
It's not on the daemon side but the client side. If you don't
want to remember the IPs of all the hosts you might want to ssh
into (at close to 3000, I don't), the client will have to make
DNS lookups. Those are what delays the connection.
> > Thing is: how long that timeout is is program dependant
> > (whatever they use in select()). A recvfrom() simply hangs. I
> > wrote a simple C program to do what glibc does (simplified
> > for brevity): ...
> so glibc will not trigger hangs, just delays in some cases.
Yup. Still: write a wrapper around ssh that will delay connects
by five seconds on 50% of the cases. Use it for two or more weeks
at work. That's how annoying it really is.
> > A "quickfix" would indeed be using two different ports for the
> > queries - but the bug in Netfilter would still be there.
> sure, the bug still exists in netfilter (kernel). but if we
> can easily mitigate the effects seen by applications using
> glibc's resolver code, that seems sane to me. i havent perused
> the glibc resolver code in a while ... do you know if it can
> easily be tweaked to use different ports, or would such a
> change be invasive ? if the latter, well i guess we'll have to
> suck it up.
I tried understanding what glibc 2.9 does regarding dns lookups,
but since it involves a rather complex (and probably quite
clever) queueing mechanism, I'm not quite sure I wouldn't break
more than I fix in doing so.