Gentoo Archives: gentoo-user

From: Alexander Puchmayr <alexander.puchmayr@×××××××.at>
To: gentoo-user@l.g.o
Subject: [gentoo-user] Serious stability problems, including freezes
Date: Tue, 02 Jun 2009 20:40:13
Message-Id: 200906022240.08088.alexander.puchmayr@linznet.at
1 Hi there!
2
3 My freshly setup homeserver has serious stability problem, which make the
4 the machine absolutely unuseable as home server.
5
6 The system is an amd64 (5050e@2600MHz, no overclocking) with 4GB Ram and
7 three 1TB SATA disks bundled to a raid5 (kernel md driver).
8 The kernels I've tried are gentoo-2.6.28-r, gentoo-2.6.29-r5 and
9 vanilla-2.6.29.4.
10
11 The problem is, when I copy large amount of data via nfs to the server, then
12 the logs get filled with entries like this:
13
14 --- Start extraction of /var/log/everything/current
15
16 [13010.203643] kswapd0: page allocation failure. order:1, mode:0x4020
17 Jun 2 21:40:43 [kernel] [13010.203653] Pid: 346, comm: kswapd0 Tainted: G
18 W 2.6.29.4 #2
19 Jun 2 21:40:43 [kernel] [13010.203657] Call Trace:
20 Jun 2 21:40:43 [kernel] [13010.203662] <IRQ> [<ffffffff810c887e>]
21 __alloc_pages_internal+0x3ce/0x4f0
22 Jun 2 21:40:43 [kernel] [13010.203690] [<ffffffff810f2d62>] ?
23 unfreeze_slab+0x92/0xf0
24 Jun 2 21:40:43 [kernel] [13010.203697] [<ffffffff810f3bc6>]
25 __slab_alloc+0x226/0x570
26 Jun 2 21:40:43 [kernel] [13010.203706] [<ffffffff814018df>] ?
27 __netdev_alloc_skb+0x1f/0x40
28 Jun 2 21:40:43 [kernel] [13010.203714] [<ffffffff810f4f4f>]
29 __kmalloc_track_caller+0x10f/0x150
30 Jun 2 21:40:43 [kernel] [13010.203720] [<ffffffff814018df>] ?
31 __netdev_alloc_skb+0x1f/0x40
32 Jun 2 21:40:43 [kernel] [13010.203726] [<ffffffff8140158e>]
33 __alloc_skb+0x6e/0x140
34 Jun 2 21:40:43 [kernel] [13010.203732] [<ffffffff814018df>]
35 __netdev_alloc_skb+0x1f/0x40
36 Jun 2 21:40:43 [kernel] [13010.203743] [<ffffffff812f6c32>]
37 rtl8169_rx_fill+0xc2/0x1f0
38 Jun 2 21:40:43 [kernel] [13010.203750] [<ffffffff812f6f80>]
39 rtl8169_rx_interrupt+0x220/0x4e0
40 Jun 2 21:40:43 [kernel] [13010.203757] [<ffffffff812f73c0>]
41 rtl8169_poll+0x40/0x220
42 Jun 2 21:40:43 [kernel] [13010.203765] [<ffffffff81407c99>]
43 net_rx_action+0xa9/0x150
44 Jun 2 21:40:43 [kernel] [13010.203775] [<ffffffff8106d95c>]
45 __do_softirq+0x9c/0x160
46 Jun 2 21:40:43 [kernel] [13010.203783] [<ffffffff8102b87c>]
47 call_softirq+0x1c/0x30
48 Jun 2 21:40:43 [kernel] [13010.203789] [<ffffffff8102cbb5>]
49 do_softirq+0x65/0xb0
50 Jun 2 21:40:43 [kernel] [13010.203796] [<ffffffff8106d595>]
51 irq_exit+0x85/0xb0
52 Jun 2 21:40:43 [kernel] [13010.203801] [<ffffffff8102ce6a>]
53 do_IRQ+0xba/0x1b0
54 Jun 2 21:40:43 [kernel] [13010.203808] [<ffffffff8102b0d3>]
55 ret_from_intr+0x0/0xf
56 Jun 2 21:40:43 [kernel] [13010.203811] <EOI> [<ffffffff815144db>] ?
57 _spin_unlock_irq+0x2b/0x40
58 Jun 2 21:40:43 [kernel] [13010.203825] [<ffffffff815144e0>] ?
59 _spin_unlock_irq+0x30/0x40
60 Jun 2 21:40:43 [kernel] [13010.203834] [<ffffffff810ceba0>] ?
61 __remove_mapping+0xd0/0x100
62 Jun 2 21:40:43 [kernel] [13010.203841] [<ffffffff810cf560>] ?
63 shrink_page_list+0x380/0x7f0
64 Jun 2 21:40:43 [kernel] [13010.203848] [<ffffffff810cfc30>] ?
65 shrink_list+0x260/0x650
66 Jun 2 21:40:43 [kernel] [13010.203856] [<ffffffff810d0293>] ?
67 shrink_zone+0x273/0x380
68 Jun 2 21:40:43 [kernel] [13010.203863] [<ffffffff810d04e7>] ?
69 shrink_slab+0x147/0x180
70 Jun 2 21:40:43 [kernel] [13010.203870] [<ffffffff810d0c71>] ?
71 kswapd+0x751/0x7c0
72 Jun 2 21:40:43 [kernel] [13010.203877] [<ffffffff810ce080>] ?
73 isolate_pages_global+0x0/0x280
74 Jun 2 21:40:43 [kernel] [13010.203885] [<ffffffff8107f970>] ?
75 autoremove_wake_function+0x0/0x40
76 Jun 2 21:40:43 [kernel] [13010.203896] [<ffffffff810919bd>] ?
77 trace_hardirqs_on+0xd/0x10
78 Jun 2 21:40:43 [kernel] [13010.203903] [<ffffffff810d0520>] ?
79 kswapd+0x0/0x7c0
80 Jun 2 21:40:43 [kernel] [13010.203909] [<ffffffff8107f559>] ?
81 kthread+0x49/0x80
82 Jun 2 21:40:43 [kernel] [13010.203915] [<ffffffff8102b77a>] ?
83 child_rip+0xa/0x20
84 Jun 2 21:40:43 [kernel] [13010.203920] [<ffffffff8102b17c>] ?
85 restore_args+0x0/0x30
86 Jun 2 21:40:43 [kernel] [13010.203926] [<ffffffff8107f510>] ?
87 kthread+0x0/0x80
88 Jun 2 21:40:43 [kernel] [13010.203932] [<ffffffff8102b770>] ?
89 child_rip+0x0/0x20
90 Jun 2 21:40:43 [kernel] [13010.203936] Mem-Info:
91 Jun 2 21:40:43 [kernel] [13010.203939] DMA per-cpu:
92 Jun 2 21:40:43 [kernel] [13010.203943] CPU 0: hi: 0, btch: 1 usd:
93 0
94 Jun 2 21:40:43 [kernel] [13010.203947] CPU 1: hi: 0, btch: 1 usd:
95 0
96 Jun 2 21:40:43 [kernel] [13010.203950] DMA32 per-cpu:
97 Jun 2 21:40:43 [kernel] [13010.203954] CPU 0: hi: 186, btch: 31 usd:
98 180
99 Jun 2 21:40:43 [kernel] [13010.203958] CPU 1: hi: 186, btch: 31 usd:
100 182
101 Jun 2 21:40:43 [kernel] [13010.203961] Normal per-cpu:
102 Jun 2 21:40:43 [kernel] [13010.203964] CPU 0: hi: 186, btch: 31 usd:
103 155
104 Jun 2 21:40:43 [kernel] [13010.203968] CPU 1: hi: 186, btch: 31 usd:
105 167
106 Jun 2 21:40:43 [kernel] [13010.203974] Active_anon:270 active_file:11782
107 inactive_anon:2024
108 Jun 2 21:40:43 [kernel] [13010.203976] inactive_file:905625 unevictable:0
109 dirty:28068 writeback:0 unstable:0
110 Jun 2 21:40:43 [kernel] [13010.203978] free:15792 slab:50324 mapped:697
111 pagetables:413 bounce:0
112 Jun 2 21:40:43 [kernel] [13010.203986] DMA free:1960kB min:28kB low:32kB
113 high:40kB active_anon:0kB inactive_anon:0kB active_file:0kB
114 inactive_file:0kB unevictable:0kB present:15152kB pages_scanned:0
115 all_unreclaimable? yes
116 Jun 2 21:40:43 [kernel] [13010.203992] lowmem_reserve[]: 0 3480 3949 3949
117 Jun 2 21:40:43 [kernel] [13010.204004] DMA32 free:60844kB min:7068kB
118 low:8832kB high:10600kB active_anon:580kB inactive_anon:384kB
119 active_file:30744kB inactive_file:3213088kB unevictable:0kB
120 present:3563552kB pages_scanned:32 all_unreclaimable? no
121 Jun 2 21:40:43 [kernel] [13010.204011] lowmem_reserve[]: 0 0 469 469
122 Jun 2 21:40:43 [kernel] [13010.204013] Normal free:364kB min:952kB
123 low:1188kB high:1428kB active_anon:500kB inactive_anon:7712kB
124 active_file:16384kB inactive_file:409412kB unevictable:0kB present:480960kB
125 pages_scanned:0 all_unreclaimable? no
126 Jun 2 21:40:43 [kernel] [13010.204013] lowmem_reserve[]: 0 0 0 0
127 Jun 2 21:40:43 [kernel] [13010.204013] 917432 total pagecache pages
128 Jun 2 21:40:43 [kernel] [13010.204013] 0 pages in swap cache
129 Jun 2 21:40:43 [kernel] [13010.204013] Swap cache stats: add 0, delete 0,
130 find 0/0
131 Jun 2 21:40:43 [kernel] [13010.204013] Free swap = 5879760kB
132 Jun 2 21:40:43 [kernel] [13010.204013] Total swap = 5879760kB
133 Jun 2 21:40:43 [kernel] [13010.204013] 1171440 pages RAM
134 Jun 2 21:40:43 [kernel] [13010.204013] 178617 pages reserved
135 Jun 2 21:40:43 [kernel] [13010.204013] 905214 pages shared
136 Jun 2 21:40:43 [kernel] [13010.204013] 73436 pages non-shared
137
138 --- End extraction of /var/log/everything/current
139
140 Sometimes its kswap0, nfsd, swapper, and a lot of other progs causing it.
141
142 However, the systems runs without any problem until it has enough physical
143 memory. If I copy larger amounts of data (e.g. typical dvd-iso images of
144 ~4GB), then the log gets filled. Today's output was >65MB :-(
145 So far, I think these messages are just annoying but do no further harm.
146
147 The second problem is that the network controler (rtl8169) seems to timeout
148 on TX, which is very bad if it happens during renewing your dhcp-lease :-(
149 the message looks like
150
151 Jun 2 18:25:23 [kernel] [ 1290.701314] ------------[ cut here ]------------
152 Jun 2 18:25:23 [kernel] [ 1290.701321] WARNING: at
153 net/sched/sch_generic.c:226 dev_watchdog+0x245/0x260()
154 Jun 2 18:25:23 [kernel] [ 1290.701326] Hardware name: System Product Name
155 Jun 2 18:25:23 [kernel] [ 1290.701330] NETDEV WATCHDOG: eth0 (r8169):
156 transmit timed out
157 Jun 2 18:25:23 [kernel] [ 1290.701334] Modules linked in: psmouse k8temp
158 pcspkr hwmon
159 Jun 2 18:25:23 [kernel] [ 1290.701348] Pid: 0, comm: swapper Not tainted
160 2.6.29.4 #2
161 Jun 2 18:25:23 [kernel] [ 1290.701352] Call Trace:
162 Jun 2 18:25:23 [kernel] [ 1290.701356] <IRQ> [<ffffffff810676c0>]
163 warn_slowpath+0xd0/0x130
164 Jun 2 18:25:23 [kernel] [ 1290.701375] [<ffffffff8108559b>] ?
165 sched_clock_cpu+0x13b/0x180
166 Jun 2 18:25:23 [kernel] [ 1290.701385] [<ffffffff8126cf19>] ?
167 strlcpy+0x49/0x60
168 Jun 2 18:25:23 [kernel] [ 1290.701391] [<ffffffff8141ca25>]
169 dev_watchdog+0x245/0x260
170 Jun 2 18:25:23 [kernel] [ 1290.701400] [<ffffffff815144db>] ?
171 _spin_unlock_irq+0x2b/0x40
172 Jun 2 18:25:23 [kernel] [ 1290.701410] [<ffffffff81091876>] ?
173 trace_hardirqs_on_caller+0x66/0x1a0
174 Jun 2 18:25:23 [kernel] [ 1290.701415] [<ffffffff8141c7e0>] ?
175 dev_watchdog+0x0/0x260
176 Jun 2 18:25:23 [kernel] [ 1290.701423] [<ffffffff810726c0>]
177 run_timer_softirq+0x170/0x250
178 Jun 2 18:25:23 [kernel] [ 1290.701431] [<ffffffff8106d95c>]
179 __do_softirq+0x9c/0x160
180 Jun 2 18:25:23 [kernel] [ 1290.701439] [<ffffffff8102b87c>]
181 call_softirq+0x1c/0x30
182 Jun 2 18:25:23 [kernel] [ 1290.701445] [<ffffffff8102cbb5>]
183 do_softirq+0x65/0xb0
184 Jun 2 18:25:23 [kernel] [ 1290.701451] [<ffffffff8106d595>]
185 irq_exit+0x85/0xb0
186 Jun 2 18:25:23 [kernel] [ 1290.701457] [<ffffffff8102ce6a>]
187 do_IRQ+0xba/0x1b0
188 Jun 2 18:25:23 [kernel] [ 1290.701463] [<ffffffff8102b0d3>]
189 ret_from_intr+0x0/0xf
190 Jun 2 18:25:23 [kernel] [ 1290.701467] <EOI> [<ffffffff81031acf>] ?
191 default_idle+0x4f/0x60
192 Jun 2 18:25:23 [kernel] [ 1290.701480] [<ffffffff81031acd>] ?
193 default_idle+0x4d/0x60
194 Jun 2 18:25:23 [kernel] [ 1290.701487] [<ffffffff81031b96>] ?
195 c1e_idle+0xa6/0x110
196 Jun 2 18:25:23 [kernel] [ 1290.701495] [<ffffffff81517b01>] ?
197 atomic_notifier_call_chain+0x11/0x20
198 Jun 2 18:25:23 [kernel] [ 1290.701504] [<ffffffff81028f27>] ?
199 cpu_idle+0x67/0xc0
200 Jun 2 18:25:23 [kernel] [ 1290.701513] [<ffffffff8150d068>] ?
201 start_secondary+0x163/0x1bb
202 Jun 2 18:25:23 [kernel] [ 1290.701518] ---[ end trace 301c1d6a9ee969de ]---
203
204
205
206
207 I don't know whether the first issue has anything todo with the second one.
208
209 The worst thing however is an occasional freeze of the whole system, which
210 happens very likely when there is high network load over nfs. If that
211 happens, I have a blank console, no network, no keyboard, nothing. Not even
212 sysrq seems to work, which makes it pretty hard to tell what has happened.
213 After resetting the machine, there is nothing suspicious in the logs (it
214 just ends).
215
216 Does anyone have suggestions?
217
218 Thanks in advance
219 Alex

Replies

Subject Author
Re: [gentoo-user] Serious stability problems, including freezes Florian Philipp <lists@××××××××××××××××××.net>