1 |
On Pén, December 12, 2008 19:09, David Sommerseth wrote: |
2 |
> |
3 |
> |
4 |
> David Sommerseth wrote: |
5 |
>> atoth@××××××××××.hu wrote: |
6 |
>>> PCI-X dual port Broadcom NetXtreme BCM5704 Gigabit Ethernet (rev 03) |
7 |
>>> adapter is working fine here driven by tg3, 2.6.27-hardened-r1. The |
8 |
>>> driver |
9 |
>>> doesn't seem to be borked with my card. |
10 |
>>> |
11 |
>>> Did you check out the "error" field of ifconfig's output for the |
12 |
>>> interface |
13 |
>>> of your card? |
14 |
>>> |
15 |
>>> Regards, |
16 |
>>> Dw. |
17 |
>> |
18 |
>> Hmmm ... No, I have not had that opportunity. The server is located |
19 |
>> 2000km away from me, and I |
20 |
>> usually call a guy (who is not a technician)to go in and press |
21 |
>> CTRL-ALT-DEL on a keyboard. That is |
22 |
>> the short-time "fix". But I'm going to have a look physically on the |
23 |
>> server in a couple of weeks, |
24 |
>> so if I get positive feedbacks from others as well regarding 2.6.27 |
25 |
>> kernel, I'm willing to try that |
26 |
>> upgrade. |
27 |
>> |
28 |
>> This interface is an on-board interface in an IBM eServer. The first |
29 |
>> time it happened, it was no |
30 |
>> problems for about 28 days. Now it was 13 days. So I expect it to |
31 |
>> happen again, soon enough. |
32 |
>> |
33 |
>> I'll try to hack the shutdown scripts to dump the ifconfig info |
34 |
>> somewhere somehow. |
35 |
> |
36 |
> Then it happened again ... and I have ifconfig stats for the interface: |
37 |
> |
38 |
> eth0 Link encap:Ethernet HWaddr 00:14:5e:5d:3c:d0 |
39 |
> inet6 addr: fe80::214:5eff:fe5d:3cd0/64 Scope:Link |
40 |
> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 |
41 |
> RX packets:10551633 errors:4294967239 dropped:767 overruns:0 |
42 |
> frame:170 |
43 |
> TX packets:9371606 errors:4294967239 dropped:0 overruns:0 |
44 |
> carrier:0 |
45 |
> collisions:4294967239 txqueuelen:1000 |
46 |
> RX bytes:28237000 (26.9 MiB) TX bytes:163377979 (155.8 MiB) |
47 |
> Interrupt:16 |
48 |
> |
49 |
> From the kernel log I see this: |
50 |
> |
51 |
> Dec 12 12:19:21 fw [74355.059369] tg3: tg3_abort_hw timed out for world, |
52 |
> TX_MODE_ENABLE will not clear MAC_TX_MODE=ffffffff |
53 |
> Dec 12 12:19:24 fw [74357.842979] tg3: world: No firmware running. |
54 |
> Dec 12 12:19:41 fw [74374.992867] tg3: world: Link is down. |
55 |
> |
56 |
> I'm surprised by the errors and collision numbers here, as I checked it |
57 |
> the |
58 |
> other day, and all of them was 0. I also know that the TX and RX values |
59 |
> was above 3-4GB, but don't remember which was what. |
60 |
> |
61 |
> Could this be an overflow bug of some kind? |
62 |
> |
63 |
> I have also found out that IBM have released an updated firmware to this |
64 |
> network device, so I'll try to upgrade it during Christmas when I'm close |
65 |
> to the box again. In the mean time I have a little ping-script, which |
66 |
> restarts network (incl. reloading of the tg3 module) when the network |
67 |
> dies. |
68 |
> This restart gives me minimal downtime. |
69 |
> |
70 |
> But I do not understand why this box was so rock solid until I upgraded |
71 |
> from 2.6.22-hardened-r8 to 2.6.25-hardened-r8. The new kernel driver |
72 |
> obviously does something it didn't do before. Unfortunately I can't find |
73 |
> anything particular in the kernel git logs for the tg3.[ch] files which |
74 |
> could pin-point anything particular. |
75 |
> |
76 |
> |
77 |
> Does anyone have any experiences regarding firmware upgrades on these |
78 |
> cards? The instructions seems pretty much forward, but if you know about |
79 |
> anything, whatever, I would appreciate that. |
80 |
> |
81 |
> |
82 |
> kind regards, |
83 |
> |
84 |
> David Sommerseth |
85 |
> |
86 |
|
87 |
Rather strange. The collisions and the errors counter shows the same... |
88 |
It was a long time ago, when I last saw collisions. |
89 |
|
90 |
There are several possibilities regarding this symptom. It would be |
91 |
important to know if the card is connected to a hub, or a switch(ing-hub)? |
92 |
1.) There can be a defective device on the subnet, which is connected to |
93 |
it from time-to-time, or it is present all the time, but doesn't hog the |
94 |
line constantly |
95 |
2.) The switch/hub can have a problem - try reconnecting the card to |
96 |
another port |
97 |
3.) The network card can have a problem, which can be software related and |
98 |
might be solved by a firmware upgrade (unfortunately the card itself |
99 |
cannot be replaced being an on-board NIC) |
100 |
4.) It can even be caused by a driver bug - which we know is all the way |
101 |
possible since the e1000 issue |
102 |
|
103 |
I hope it'll turn out soon. I would think about a hardware issue, but it's |
104 |
a disturbing fact, that these symptoms appeared after a kernel upgrade. |
105 |
|
106 |
Here's my ifconfig output for reference: |
107 |
bond0 Link encap:Ethernet HWaddr 00:10:18:06:ce:24 |
108 |
inet addr:195.111.75.211 Bcast:195.111.75.255 |
109 |
Mask:255.255.255.192 |
110 |
UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1 |
111 |
RX packets:9285671 errors:0 dropped:0 overruns:0 frame:0 |
112 |
TX packets:1681056 errors:0 dropped:0 overruns:0 carrier:0 |
113 |
collisions:0 txqueuelen:0 |
114 |
RX bytes:2100416838 (1.9 GiB) TX bytes:1298939064 (1.2 GiB) |
115 |
|
116 |
eth0 Link encap:Ethernet HWaddr 00:10:18:06:ce:24 |
117 |
UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1 |
118 |
RX packets:5395008 errors:0 dropped:0 overruns:0 frame:0 |
119 |
TX packets:1681040 errors:0 dropped:0 overruns:0 carrier:0 |
120 |
collisions:0 txqueuelen:1000 |
121 |
RX bytes:1529378855 (1.4 GiB) TX bytes:1298937508 (1.2 GiB) |
122 |
Interrupt:20 |
123 |
|
124 |
eth1 Link encap:Ethernet HWaddr 00:10:18:06:ce:24 |
125 |
UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1 |
126 |
RX packets:3890663 errors:0 dropped:0 overruns:0 frame:0 |
127 |
TX packets:16 errors:0 dropped:0 overruns:0 carrier:0 |
128 |
collisions:0 txqueuelen:1000 |
129 |
RX bytes:571037983 (544.5 MiB) TX bytes:1556 (1.5 KiB) |
130 |
Interrupt:21 |
131 |
|
132 |
lspci: |
133 |
00:08.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5704 |
134 |
Gigabit Ethernet (rev 03) |
135 |
00:08.1 Ethernet controller: Broadcom Corporation NetXtreme BCM5704 |
136 |
Gigabit Ethernet (rev 03) |
137 |
|
138 |
Regards, |
139 |
Dw. |
140 |
-- |
141 |
dr Tóth Attila, Radiológus, 06-20-825-8057, 06-30-5962-962 |
142 |
Attila Toth MD, Radiologist, +36-20-825-8057, +36-30-5962-962 |