1 |
First of all, thanks to everybody who responded so far. |
2 |
|
3 |
I wanted preface my reply to Alan by mentioning that the local sysadmin made |
4 |
changes to the DHCP server that appear to have worked around whatever the issue |
5 |
is. |
6 |
|
7 |
I don't fully understand the error analysis (something to do with the DHCP |
8 |
client reaching a particular state and sending DHCP packets that something |
9 |
in-between it and the DHCP server doesn't like and that might result in vendor |
10 |
dependent behaviour), but what the DHCP server now does is tell the client to |
11 |
use the broadcast address as the DHCP server address (which is weird, because |
12 |
the DHCP clients always switch to the broadcast address after a timeout, but of |
13 |
course I'm no DHCP expert). The affected PCs have been working normally all |
14 |
day today. |
15 |
|
16 |
So the current resolution is "it works", but we still don't understand (or at |
17 |
least me and my boss don't) what the underlying issue is. Hence I'm still |
18 |
curious what people who know these technologies better than me think. |
19 |
|
20 |
Also, I suppose it was confusing to say that the switch never saw the packets. |
21 |
The way this was determined was by post-mortem log inspection; AFAIK we didn't |
22 |
do any live inspection on the switch. Based on the workaround, the conclusion |
23 |
we came to is that the switch must have dropped the packets (for whatever |
24 |
reason) without logging that it did. |
25 |
|
26 |
Am Fri, 6 Mar 2015 08:01:44 +0200 |
27 |
schrieb Alan McKinnon <alan.mckinnon@×××××.com>: |
28 |
|
29 |
[...] |
30 |
> I've seen similar things many times myself (but nevr on Intel network |
31 |
> kit so far) |
32 |
> |
33 |
> A lot of reading and Googling usually leads to the solution: |
34 |
> |
35 |
> - firmware upgrade for the hardware |
36 |
|
37 |
OK, I can look into that. |
38 |
|
39 |
> - use the correct driver (this is often non-obvious) |
40 |
> - try the in-kernel driver vs any out-of-tree vendor driver |
41 |
|
42 |
All PCs run with the e1000e in-kernel module. I think the Fedora systems run |
43 |
3.18.7, so it's about as current as it can be, too. Could it really be that the |
44 |
kernel selects the wrong driver? |
45 |
|
46 |
> - apply driver parameters designed to work around buggy hardware (this |
47 |
> often involves (much reading) |
48 |
|
49 |
I will also consider that. I see that the kernel sources contains |
50 |
documentation for the e1000e driver that I can look at. |
51 |
|
52 |
-- |
53 |
Marc Joliet |
54 |
-- |
55 |
"People who think they know everything really annoy those of us who know we |
56 |
don't" - Bjarne Stroustrup |