1 |
Hi there! |
2 |
|
3 |
My freshly setup homeserver has serious stability problem, which make the |
4 |
the machine absolutely unuseable as home server. |
5 |
|
6 |
The system is an amd64 (5050e@2600MHz, no overclocking) with 4GB Ram and |
7 |
three 1TB SATA disks bundled to a raid5 (kernel md driver). |
8 |
The kernels I've tried are gentoo-2.6.28-r, gentoo-2.6.29-r5 and |
9 |
vanilla-2.6.29.4. |
10 |
|
11 |
The problem is, when I copy large amount of data via nfs to the server, then |
12 |
the logs get filled with entries like this: |
13 |
|
14 |
--- Start extraction of /var/log/everything/current |
15 |
|
16 |
[13010.203643] kswapd0: page allocation failure. order:1, mode:0x4020 |
17 |
Jun 2 21:40:43 [kernel] [13010.203653] Pid: 346, comm: kswapd0 Tainted: G |
18 |
W 2.6.29.4 #2 |
19 |
Jun 2 21:40:43 [kernel] [13010.203657] Call Trace: |
20 |
Jun 2 21:40:43 [kernel] [13010.203662] <IRQ> [<ffffffff810c887e>] |
21 |
__alloc_pages_internal+0x3ce/0x4f0 |
22 |
Jun 2 21:40:43 [kernel] [13010.203690] [<ffffffff810f2d62>] ? |
23 |
unfreeze_slab+0x92/0xf0 |
24 |
Jun 2 21:40:43 [kernel] [13010.203697] [<ffffffff810f3bc6>] |
25 |
__slab_alloc+0x226/0x570 |
26 |
Jun 2 21:40:43 [kernel] [13010.203706] [<ffffffff814018df>] ? |
27 |
__netdev_alloc_skb+0x1f/0x40 |
28 |
Jun 2 21:40:43 [kernel] [13010.203714] [<ffffffff810f4f4f>] |
29 |
__kmalloc_track_caller+0x10f/0x150 |
30 |
Jun 2 21:40:43 [kernel] [13010.203720] [<ffffffff814018df>] ? |
31 |
__netdev_alloc_skb+0x1f/0x40 |
32 |
Jun 2 21:40:43 [kernel] [13010.203726] [<ffffffff8140158e>] |
33 |
__alloc_skb+0x6e/0x140 |
34 |
Jun 2 21:40:43 [kernel] [13010.203732] [<ffffffff814018df>] |
35 |
__netdev_alloc_skb+0x1f/0x40 |
36 |
Jun 2 21:40:43 [kernel] [13010.203743] [<ffffffff812f6c32>] |
37 |
rtl8169_rx_fill+0xc2/0x1f0 |
38 |
Jun 2 21:40:43 [kernel] [13010.203750] [<ffffffff812f6f80>] |
39 |
rtl8169_rx_interrupt+0x220/0x4e0 |
40 |
Jun 2 21:40:43 [kernel] [13010.203757] [<ffffffff812f73c0>] |
41 |
rtl8169_poll+0x40/0x220 |
42 |
Jun 2 21:40:43 [kernel] [13010.203765] [<ffffffff81407c99>] |
43 |
net_rx_action+0xa9/0x150 |
44 |
Jun 2 21:40:43 [kernel] [13010.203775] [<ffffffff8106d95c>] |
45 |
__do_softirq+0x9c/0x160 |
46 |
Jun 2 21:40:43 [kernel] [13010.203783] [<ffffffff8102b87c>] |
47 |
call_softirq+0x1c/0x30 |
48 |
Jun 2 21:40:43 [kernel] [13010.203789] [<ffffffff8102cbb5>] |
49 |
do_softirq+0x65/0xb0 |
50 |
Jun 2 21:40:43 [kernel] [13010.203796] [<ffffffff8106d595>] |
51 |
irq_exit+0x85/0xb0 |
52 |
Jun 2 21:40:43 [kernel] [13010.203801] [<ffffffff8102ce6a>] |
53 |
do_IRQ+0xba/0x1b0 |
54 |
Jun 2 21:40:43 [kernel] [13010.203808] [<ffffffff8102b0d3>] |
55 |
ret_from_intr+0x0/0xf |
56 |
Jun 2 21:40:43 [kernel] [13010.203811] <EOI> [<ffffffff815144db>] ? |
57 |
_spin_unlock_irq+0x2b/0x40 |
58 |
Jun 2 21:40:43 [kernel] [13010.203825] [<ffffffff815144e0>] ? |
59 |
_spin_unlock_irq+0x30/0x40 |
60 |
Jun 2 21:40:43 [kernel] [13010.203834] [<ffffffff810ceba0>] ? |
61 |
__remove_mapping+0xd0/0x100 |
62 |
Jun 2 21:40:43 [kernel] [13010.203841] [<ffffffff810cf560>] ? |
63 |
shrink_page_list+0x380/0x7f0 |
64 |
Jun 2 21:40:43 [kernel] [13010.203848] [<ffffffff810cfc30>] ? |
65 |
shrink_list+0x260/0x650 |
66 |
Jun 2 21:40:43 [kernel] [13010.203856] [<ffffffff810d0293>] ? |
67 |
shrink_zone+0x273/0x380 |
68 |
Jun 2 21:40:43 [kernel] [13010.203863] [<ffffffff810d04e7>] ? |
69 |
shrink_slab+0x147/0x180 |
70 |
Jun 2 21:40:43 [kernel] [13010.203870] [<ffffffff810d0c71>] ? |
71 |
kswapd+0x751/0x7c0 |
72 |
Jun 2 21:40:43 [kernel] [13010.203877] [<ffffffff810ce080>] ? |
73 |
isolate_pages_global+0x0/0x280 |
74 |
Jun 2 21:40:43 [kernel] [13010.203885] [<ffffffff8107f970>] ? |
75 |
autoremove_wake_function+0x0/0x40 |
76 |
Jun 2 21:40:43 [kernel] [13010.203896] [<ffffffff810919bd>] ? |
77 |
trace_hardirqs_on+0xd/0x10 |
78 |
Jun 2 21:40:43 [kernel] [13010.203903] [<ffffffff810d0520>] ? |
79 |
kswapd+0x0/0x7c0 |
80 |
Jun 2 21:40:43 [kernel] [13010.203909] [<ffffffff8107f559>] ? |
81 |
kthread+0x49/0x80 |
82 |
Jun 2 21:40:43 [kernel] [13010.203915] [<ffffffff8102b77a>] ? |
83 |
child_rip+0xa/0x20 |
84 |
Jun 2 21:40:43 [kernel] [13010.203920] [<ffffffff8102b17c>] ? |
85 |
restore_args+0x0/0x30 |
86 |
Jun 2 21:40:43 [kernel] [13010.203926] [<ffffffff8107f510>] ? |
87 |
kthread+0x0/0x80 |
88 |
Jun 2 21:40:43 [kernel] [13010.203932] [<ffffffff8102b770>] ? |
89 |
child_rip+0x0/0x20 |
90 |
Jun 2 21:40:43 [kernel] [13010.203936] Mem-Info: |
91 |
Jun 2 21:40:43 [kernel] [13010.203939] DMA per-cpu: |
92 |
Jun 2 21:40:43 [kernel] [13010.203943] CPU 0: hi: 0, btch: 1 usd: |
93 |
0 |
94 |
Jun 2 21:40:43 [kernel] [13010.203947] CPU 1: hi: 0, btch: 1 usd: |
95 |
0 |
96 |
Jun 2 21:40:43 [kernel] [13010.203950] DMA32 per-cpu: |
97 |
Jun 2 21:40:43 [kernel] [13010.203954] CPU 0: hi: 186, btch: 31 usd: |
98 |
180 |
99 |
Jun 2 21:40:43 [kernel] [13010.203958] CPU 1: hi: 186, btch: 31 usd: |
100 |
182 |
101 |
Jun 2 21:40:43 [kernel] [13010.203961] Normal per-cpu: |
102 |
Jun 2 21:40:43 [kernel] [13010.203964] CPU 0: hi: 186, btch: 31 usd: |
103 |
155 |
104 |
Jun 2 21:40:43 [kernel] [13010.203968] CPU 1: hi: 186, btch: 31 usd: |
105 |
167 |
106 |
Jun 2 21:40:43 [kernel] [13010.203974] Active_anon:270 active_file:11782 |
107 |
inactive_anon:2024 |
108 |
Jun 2 21:40:43 [kernel] [13010.203976] inactive_file:905625 unevictable:0 |
109 |
dirty:28068 writeback:0 unstable:0 |
110 |
Jun 2 21:40:43 [kernel] [13010.203978] free:15792 slab:50324 mapped:697 |
111 |
pagetables:413 bounce:0 |
112 |
Jun 2 21:40:43 [kernel] [13010.203986] DMA free:1960kB min:28kB low:32kB |
113 |
high:40kB active_anon:0kB inactive_anon:0kB active_file:0kB |
114 |
inactive_file:0kB unevictable:0kB present:15152kB pages_scanned:0 |
115 |
all_unreclaimable? yes |
116 |
Jun 2 21:40:43 [kernel] [13010.203992] lowmem_reserve[]: 0 3480 3949 3949 |
117 |
Jun 2 21:40:43 [kernel] [13010.204004] DMA32 free:60844kB min:7068kB |
118 |
low:8832kB high:10600kB active_anon:580kB inactive_anon:384kB |
119 |
active_file:30744kB inactive_file:3213088kB unevictable:0kB |
120 |
present:3563552kB pages_scanned:32 all_unreclaimable? no |
121 |
Jun 2 21:40:43 [kernel] [13010.204011] lowmem_reserve[]: 0 0 469 469 |
122 |
Jun 2 21:40:43 [kernel] [13010.204013] Normal free:364kB min:952kB |
123 |
low:1188kB high:1428kB active_anon:500kB inactive_anon:7712kB |
124 |
active_file:16384kB inactive_file:409412kB unevictable:0kB present:480960kB |
125 |
pages_scanned:0 all_unreclaimable? no |
126 |
Jun 2 21:40:43 [kernel] [13010.204013] lowmem_reserve[]: 0 0 0 0 |
127 |
Jun 2 21:40:43 [kernel] [13010.204013] 917432 total pagecache pages |
128 |
Jun 2 21:40:43 [kernel] [13010.204013] 0 pages in swap cache |
129 |
Jun 2 21:40:43 [kernel] [13010.204013] Swap cache stats: add 0, delete 0, |
130 |
find 0/0 |
131 |
Jun 2 21:40:43 [kernel] [13010.204013] Free swap = 5879760kB |
132 |
Jun 2 21:40:43 [kernel] [13010.204013] Total swap = 5879760kB |
133 |
Jun 2 21:40:43 [kernel] [13010.204013] 1171440 pages RAM |
134 |
Jun 2 21:40:43 [kernel] [13010.204013] 178617 pages reserved |
135 |
Jun 2 21:40:43 [kernel] [13010.204013] 905214 pages shared |
136 |
Jun 2 21:40:43 [kernel] [13010.204013] 73436 pages non-shared |
137 |
|
138 |
--- End extraction of /var/log/everything/current |
139 |
|
140 |
Sometimes its kswap0, nfsd, swapper, and a lot of other progs causing it. |
141 |
|
142 |
However, the systems runs without any problem until it has enough physical |
143 |
memory. If I copy larger amounts of data (e.g. typical dvd-iso images of |
144 |
~4GB), then the log gets filled. Today's output was >65MB :-( |
145 |
So far, I think these messages are just annoying but do no further harm. |
146 |
|
147 |
The second problem is that the network controler (rtl8169) seems to timeout |
148 |
on TX, which is very bad if it happens during renewing your dhcp-lease :-( |
149 |
the message looks like |
150 |
|
151 |
Jun 2 18:25:23 [kernel] [ 1290.701314] ------------[ cut here ]------------ |
152 |
Jun 2 18:25:23 [kernel] [ 1290.701321] WARNING: at |
153 |
net/sched/sch_generic.c:226 dev_watchdog+0x245/0x260() |
154 |
Jun 2 18:25:23 [kernel] [ 1290.701326] Hardware name: System Product Name |
155 |
Jun 2 18:25:23 [kernel] [ 1290.701330] NETDEV WATCHDOG: eth0 (r8169): |
156 |
transmit timed out |
157 |
Jun 2 18:25:23 [kernel] [ 1290.701334] Modules linked in: psmouse k8temp |
158 |
pcspkr hwmon |
159 |
Jun 2 18:25:23 [kernel] [ 1290.701348] Pid: 0, comm: swapper Not tainted |
160 |
2.6.29.4 #2 |
161 |
Jun 2 18:25:23 [kernel] [ 1290.701352] Call Trace: |
162 |
Jun 2 18:25:23 [kernel] [ 1290.701356] <IRQ> [<ffffffff810676c0>] |
163 |
warn_slowpath+0xd0/0x130 |
164 |
Jun 2 18:25:23 [kernel] [ 1290.701375] [<ffffffff8108559b>] ? |
165 |
sched_clock_cpu+0x13b/0x180 |
166 |
Jun 2 18:25:23 [kernel] [ 1290.701385] [<ffffffff8126cf19>] ? |
167 |
strlcpy+0x49/0x60 |
168 |
Jun 2 18:25:23 [kernel] [ 1290.701391] [<ffffffff8141ca25>] |
169 |
dev_watchdog+0x245/0x260 |
170 |
Jun 2 18:25:23 [kernel] [ 1290.701400] [<ffffffff815144db>] ? |
171 |
_spin_unlock_irq+0x2b/0x40 |
172 |
Jun 2 18:25:23 [kernel] [ 1290.701410] [<ffffffff81091876>] ? |
173 |
trace_hardirqs_on_caller+0x66/0x1a0 |
174 |
Jun 2 18:25:23 [kernel] [ 1290.701415] [<ffffffff8141c7e0>] ? |
175 |
dev_watchdog+0x0/0x260 |
176 |
Jun 2 18:25:23 [kernel] [ 1290.701423] [<ffffffff810726c0>] |
177 |
run_timer_softirq+0x170/0x250 |
178 |
Jun 2 18:25:23 [kernel] [ 1290.701431] [<ffffffff8106d95c>] |
179 |
__do_softirq+0x9c/0x160 |
180 |
Jun 2 18:25:23 [kernel] [ 1290.701439] [<ffffffff8102b87c>] |
181 |
call_softirq+0x1c/0x30 |
182 |
Jun 2 18:25:23 [kernel] [ 1290.701445] [<ffffffff8102cbb5>] |
183 |
do_softirq+0x65/0xb0 |
184 |
Jun 2 18:25:23 [kernel] [ 1290.701451] [<ffffffff8106d595>] |
185 |
irq_exit+0x85/0xb0 |
186 |
Jun 2 18:25:23 [kernel] [ 1290.701457] [<ffffffff8102ce6a>] |
187 |
do_IRQ+0xba/0x1b0 |
188 |
Jun 2 18:25:23 [kernel] [ 1290.701463] [<ffffffff8102b0d3>] |
189 |
ret_from_intr+0x0/0xf |
190 |
Jun 2 18:25:23 [kernel] [ 1290.701467] <EOI> [<ffffffff81031acf>] ? |
191 |
default_idle+0x4f/0x60 |
192 |
Jun 2 18:25:23 [kernel] [ 1290.701480] [<ffffffff81031acd>] ? |
193 |
default_idle+0x4d/0x60 |
194 |
Jun 2 18:25:23 [kernel] [ 1290.701487] [<ffffffff81031b96>] ? |
195 |
c1e_idle+0xa6/0x110 |
196 |
Jun 2 18:25:23 [kernel] [ 1290.701495] [<ffffffff81517b01>] ? |
197 |
atomic_notifier_call_chain+0x11/0x20 |
198 |
Jun 2 18:25:23 [kernel] [ 1290.701504] [<ffffffff81028f27>] ? |
199 |
cpu_idle+0x67/0xc0 |
200 |
Jun 2 18:25:23 [kernel] [ 1290.701513] [<ffffffff8150d068>] ? |
201 |
start_secondary+0x163/0x1bb |
202 |
Jun 2 18:25:23 [kernel] [ 1290.701518] ---[ end trace 301c1d6a9ee969de ]--- |
203 |
|
204 |
|
205 |
|
206 |
|
207 |
I don't know whether the first issue has anything todo with the second one. |
208 |
|
209 |
The worst thing however is an occasional freeze of the whole system, which |
210 |
happens very likely when there is high network load over nfs. If that |
211 |
happens, I have a blank console, no network, no keyboard, nothing. Not even |
212 |
sysrq seems to work, which makes it pretty hard to tell what has happened. |
213 |
After resetting the machine, there is nothing suspicious in the logs (it |
214 |
just ends). |
215 |
|
216 |
Does anyone have suggestions? |
217 |
|
218 |
Thanks in advance |
219 |
Alex |