Gentoo Archives: gentoo-hardened

From: atoth@××××××××××.hu
To: gentoo-hardened@l.g.o
Subject: Re: [gentoo-hardened] tg3 driver - transmit timed out, resetting
Date: Fri, 12 Dec 2008 19:21:09
Message-Id: 417e0284ba12004db13df186d21d2439.squirrel@atoth.sote.hu
In Reply to: Re: [gentoo-hardened] tg3 driver - transmit timed out, resetting by David Sommerseth
1 On Pén, December 12, 2008 19:09, David Sommerseth wrote:
2 >
3 >
4 > David Sommerseth wrote:
5 >> atoth@××××××××××.hu wrote:
6 >>> PCI-X dual port Broadcom NetXtreme BCM5704 Gigabit Ethernet (rev 03)
7 >>> adapter is working fine here driven by tg3, 2.6.27-hardened-r1. The
8 >>> driver
9 >>> doesn't seem to be borked with my card.
10 >>>
11 >>> Did you check out the "error" field of ifconfig's output for the
12 >>> interface
13 >>> of your card?
14 >>>
15 >>> Regards,
16 >>> Dw.
17 >>
18 >> Hmmm ... No, I have not had that opportunity. The server is located
19 >> 2000km away from me, and I
20 >> usually call a guy (who is not a technician)to go in and press
21 >> CTRL-ALT-DEL on a keyboard. That is
22 >> the short-time "fix". But I'm going to have a look physically on the
23 >> server in a couple of weeks,
24 >> so if I get positive feedbacks from others as well regarding 2.6.27
25 >> kernel, I'm willing to try that
26 >> upgrade.
27 >>
28 >> This interface is an on-board interface in an IBM eServer. The first
29 >> time it happened, it was no
30 >> problems for about 28 days. Now it was 13 days. So I expect it to
31 >> happen again, soon enough.
32 >>
33 >> I'll try to hack the shutdown scripts to dump the ifconfig info
34 >> somewhere somehow.
35 >
36 > Then it happened again ... and I have ifconfig stats for the interface:
37 >
38 > eth0 Link encap:Ethernet HWaddr 00:14:5e:5d:3c:d0
39 > inet6 addr: fe80::214:5eff:fe5d:3cd0/64 Scope:Link
40 > UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
41 > RX packets:10551633 errors:4294967239 dropped:767 overruns:0
42 > frame:170
43 > TX packets:9371606 errors:4294967239 dropped:0 overruns:0
44 > carrier:0
45 > collisions:4294967239 txqueuelen:1000
46 > RX bytes:28237000 (26.9 MiB) TX bytes:163377979 (155.8 MiB)
47 > Interrupt:16
48 >
49 > From the kernel log I see this:
50 >
51 > Dec 12 12:19:21 fw [74355.059369] tg3: tg3_abort_hw timed out for world,
52 > TX_MODE_ENABLE will not clear MAC_TX_MODE=ffffffff
53 > Dec 12 12:19:24 fw [74357.842979] tg3: world: No firmware running.
54 > Dec 12 12:19:41 fw [74374.992867] tg3: world: Link is down.
55 >
56 > I'm surprised by the errors and collision numbers here, as I checked it
57 > the
58 > other day, and all of them was 0. I also know that the TX and RX values
59 > was above 3-4GB, but don't remember which was what.
60 >
61 > Could this be an overflow bug of some kind?
62 >
63 > I have also found out that IBM have released an updated firmware to this
64 > network device, so I'll try to upgrade it during Christmas when I'm close
65 > to the box again. In the mean time I have a little ping-script, which
66 > restarts network (incl. reloading of the tg3 module) when the network
67 > dies.
68 > This restart gives me minimal downtime.
69 >
70 > But I do not understand why this box was so rock solid until I upgraded
71 > from 2.6.22-hardened-r8 to 2.6.25-hardened-r8. The new kernel driver
72 > obviously does something it didn't do before. Unfortunately I can't find
73 > anything particular in the kernel git logs for the tg3.[ch] files which
74 > could pin-point anything particular.
75 >
76 >
77 > Does anyone have any experiences regarding firmware upgrades on these
78 > cards? The instructions seems pretty much forward, but if you know about
79 > anything, whatever, I would appreciate that.
80 >
81 >
82 > kind regards,
83 >
84 > David Sommerseth
85 >
86
87 Rather strange. The collisions and the errors counter shows the same...
88 It was a long time ago, when I last saw collisions.
89
90 There are several possibilities regarding this symptom. It would be
91 important to know if the card is connected to a hub, or a switch(ing-hub)?
92 1.) There can be a defective device on the subnet, which is connected to
93 it from time-to-time, or it is present all the time, but doesn't hog the
94 line constantly
95 2.) The switch/hub can have a problem - try reconnecting the card to
96 another port
97 3.) The network card can have a problem, which can be software related and
98 might be solved by a firmware upgrade (unfortunately the card itself
99 cannot be replaced being an on-board NIC)
100 4.) It can even be caused by a driver bug - which we know is all the way
101 possible since the e1000 issue
102
103 I hope it'll turn out soon. I would think about a hardware issue, but it's
104 a disturbing fact, that these symptoms appeared after a kernel upgrade.
105
106 Here's my ifconfig output for reference:
107 bond0 Link encap:Ethernet HWaddr 00:10:18:06:ce:24
108 inet addr:195.111.75.211 Bcast:195.111.75.255
109 Mask:255.255.255.192
110 UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1
111 RX packets:9285671 errors:0 dropped:0 overruns:0 frame:0
112 TX packets:1681056 errors:0 dropped:0 overruns:0 carrier:0
113 collisions:0 txqueuelen:0
114 RX bytes:2100416838 (1.9 GiB) TX bytes:1298939064 (1.2 GiB)
115
116 eth0 Link encap:Ethernet HWaddr 00:10:18:06:ce:24
117 UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1
118 RX packets:5395008 errors:0 dropped:0 overruns:0 frame:0
119 TX packets:1681040 errors:0 dropped:0 overruns:0 carrier:0
120 collisions:0 txqueuelen:1000
121 RX bytes:1529378855 (1.4 GiB) TX bytes:1298937508 (1.2 GiB)
122 Interrupt:20
123
124 eth1 Link encap:Ethernet HWaddr 00:10:18:06:ce:24
125 UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1
126 RX packets:3890663 errors:0 dropped:0 overruns:0 frame:0
127 TX packets:16 errors:0 dropped:0 overruns:0 carrier:0
128 collisions:0 txqueuelen:1000
129 RX bytes:571037983 (544.5 MiB) TX bytes:1556 (1.5 KiB)
130 Interrupt:21
131
132 lspci:
133 00:08.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5704
134 Gigabit Ethernet (rev 03)
135 00:08.1 Ethernet controller: Broadcom Corporation NetXtreme BCM5704
136 Gigabit Ethernet (rev 03)
137
138 Regards,
139 Dw.
140 --
141 dr Tóth Attila, Radiológus, 06-20-825-8057, 06-30-5962-962
142 Attila Toth MD, Radiologist, +36-20-825-8057, +36-30-5962-962

Replies

Subject Author
Re: [gentoo-hardened] tg3 driver - transmit timed out, resetting David Sommerseth <gentoo.list@××××××××××××.net>