Gentoo Archives: gentoo-user

From: Marc Joliet <marcec@×××.de>
To: gentoo-user@l.g.o
Subject: Re: [gentoo-user] Strange network behaviour: NIC goes down, DHCP lease renewal fails (WORKED AROUND)
Date: Fri, 06 Mar 2015 18:45:53
Message-Id: 20150306194535.3727ccb6@marcec.fritz.box
In Reply to: Re: [gentoo-user] Strange network behaviour: NIC goes down, DHCP lease renewal fails by Alan McKinnon
1 First of all, thanks to everybody who responded so far.
2
3 I wanted preface my reply to Alan by mentioning that the local sysadmin made
4 changes to the DHCP server that appear to have worked around whatever the issue
5 is.
6
7 I don't fully understand the error analysis (something to do with the DHCP
8 client reaching a particular state and sending DHCP packets that something
9 in-between it and the DHCP server doesn't like and that might result in vendor
10 dependent behaviour), but what the DHCP server now does is tell the client to
11 use the broadcast address as the DHCP server address (which is weird, because
12 the DHCP clients always switch to the broadcast address after a timeout, but of
13 course I'm no DHCP expert). The affected PCs have been working normally all
14 day today.
15
16 So the current resolution is "it works", but we still don't understand (or at
17 least me and my boss don't) what the underlying issue is. Hence I'm still
18 curious what people who know these technologies better than me think.
19
20 Also, I suppose it was confusing to say that the switch never saw the packets.
21 The way this was determined was by post-mortem log inspection; AFAIK we didn't
22 do any live inspection on the switch. Based on the workaround, the conclusion
23 we came to is that the switch must have dropped the packets (for whatever
24 reason) without logging that it did.
25
26 Am Fri, 6 Mar 2015 08:01:44 +0200
27 schrieb Alan McKinnon <alan.mckinnon@×××××.com>:
28
29 [...]
30 > I've seen similar things many times myself (but nevr on Intel network
31 > kit so far)
32 >
33 > A lot of reading and Googling usually leads to the solution:
34 >
35 > - firmware upgrade for the hardware
36
37 OK, I can look into that.
38
39 > - use the correct driver (this is often non-obvious)
40 > - try the in-kernel driver vs any out-of-tree vendor driver
41
42 All PCs run with the e1000e in-kernel module. I think the Fedora systems run
43 3.18.7, so it's about as current as it can be, too. Could it really be that the
44 kernel selects the wrong driver?
45
46 > - apply driver parameters designed to work around buggy hardware (this
47 > often involves (much reading)
48
49 I will also consider that. I see that the kernel sources contains
50 documentation for the e1000e driver that I can look at.
51
52 --
53 Marc Joliet
54 --
55 "People who think they know everything really annoy those of us who know we
56 don't" - Bjarne Stroustrup

Replies