1 |
Hi all, |
2 |
|
3 |
well, hard to explain, but recently I encounter *lots* of crashes on |
4 |
one mashine serving via vserver-sources with about 3-5 VEs. |
5 |
|
6 |
The physical server itself is a 10k IBM amd64 host with 2 dual cores |
7 |
and 16GB RAM, raid10 SATA, just if you want to know. |
8 |
|
9 |
However, this is what I found in the logs: |
10 |
|
11 |
|
12 |
Sep 11 20:05:11 jupjep ------------[ cut here ]------------ |
13 |
Sep 11 20:05:11 jupjep kernel BUG at kernel/vserver/context.c:193! |
14 |
Sep 11 20:05:11 jupjep invalid opcode: 0000 [1] SMP |
15 |
Sep 11 20:05:11 jupjep CPU 2 |
16 |
Sep 11 20:05:11 jupjep Modules linked in: iptable_nat nf_nat iptable_filter ip_tables x_tables nfsd exportfs lockd nfs_acl sunrpc nf_conntrack_ipv4 nf_conntrack nfnetlink ohci_hcd ehci_hcd usbcore k8tem |
17 |
Sep 11 20:05:11 jupjep Pid: 26337:#242, comm: sshd Not tainted 2.6.20-vs2.3.0.11-gentoo #1 |
18 |
Sep 11 20:05:11 jupjep RIP: 0010:[<ffffffff8109a24b>] [<ffffffff8109a24b>] free_vx_info+0xf/0x8d |
19 |
Sep 11 20:05:11 jupjep RSP: 0018:ffff8101363a1dd8 EFLAGS: 00010246 |
20 |
Sep 11 20:05:11 jupjep RAX: ffff81040ac6a001 RBX: ffff81040ac6a000 RCX: 0000000000000000 |
21 |
Sep 11 20:05:11 jupjep RDX: 0000000000000000 RSI: 0000000000000286 RDI: ffff81040ac6a000 |
22 |
Sep 11 20:05:11 jupjep RBP: 0000000000000000 R08: ffff8103ff3bcc58 R09: ffffffff00000000 |
23 |
Sep 11 20:05:11 jupjep R10: 0000000000000000 R11: ffffffff8132a2e2 R12: 0000000000000000 |
24 |
Sep 11 20:05:11 jupjep R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000001 |
25 |
Sep 11 20:05:11 jupjep FS: 00002b988b369e70(0000) GS:ffff8104118783c0(0000) knlGS:0000000056502b90 |
26 |
Sep 11 20:05:11 jupjep CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b |
27 |
Sep 11 20:05:11 jupjep CR2: 00002b988c005f80 CR3: 0000000001001000 CR4: 00000000000006e0 |
28 |
Sep 11 20:05:11 jupjep Process sshd (pid: 26337[#242], threadinfo ffff8101363a0000, task ffff810302a70040) |
29 |
Sep 11 20:05:11 jupjep Stack: ffff8103ff3bcf40 ffffffff8132c56c 0000000000000000 ffff8103ff3bcf40 |
30 |
Sep 11 20:05:11 jupjep ffff8103ff3bc9c0 ffffffff8104ff06 ffff8103ff3bc9c0 0000000000000000 |
31 |
Sep 11 20:05:11 jupjep ffff8103ca88db80 ffff8103ca88dbd0 ffff8101051924a8 ffff81041183b980 |
32 |
Sep 11 20:05:11 jupjep Call Trace: |
33 |
Sep 11 20:05:11 jupjep [<ffffffff8132c56c>] sk_free+0xd9/0x133 |
34 |
Sep 11 20:05:11 jupjep [<ffffffff8104ff06>] unix_release_sock+0x172/0x202 |
35 |
Sep 11 20:05:11 jupjep [<ffffffff810531d5>] sock_release+0x19/0x72 |
36 |
Sep 11 20:05:11 jupjep [<ffffffff8105338d>] sock_close+0x2c/0x30 |
37 |
Sep 11 20:05:11 jupjep [<ffffffff81012992>] __fput+0xa1/0x19a |
38 |
Sep 11 20:05:11 jupjep [<ffffffff8102408d>] filp_close+0x5d/0x65 |
39 |
Sep 11 20:05:11 jupjep [<ffffffff810381d6>] put_files_struct+0x66/0xe1 |
40 |
Sep 11 20:05:11 jupjep [<ffffffff81015452>] do_exit+0x264/0x8de |
41 |
Sep 11 20:05:11 jupjep [<ffffffff81047aec>] cpuset_exit+0x0/0x6b |
42 |
Sep 11 20:05:11 jupjep [<ffffffff8105d11e>] system_call+0x7e/0x83 |
43 |
Sep 11 20:05:11 jupjep |
44 |
Sep 11 20:05:11 jupjep |
45 |
Sep 11 20:05:11 jupjep Code: 0f 0b eb fe 83 7f 14 00 74 04 0f 0b eb fe 83 7f 18 00 74 04 |
46 |
Sep 11 20:05:11 jupjep RIP [<ffffffff8109a24b>] free_vx_info+0xf/0x8d |
47 |
Sep 11 20:05:11 jupjep RSP <ffff8101363a1dd8> |
48 |
Sep 11 20:05:11 jupjep <1>Fixing recursive fault but reboot is needed! |
49 |
|
50 |
just a moment later: |
51 |
|
52 |
Sep 11 20:10:01 jupjep ------------[ cut here ]------------ |
53 |
Sep 11 20:10:01 jupjep kernel BUG at kernel/vserver/context.c:193! |
54 |
Sep 11 20:10:01 jupjep invalid opcode: 0000 [2] SMP |
55 |
Sep 11 20:10:01 jupjep CPU 3 |
56 |
Sep 11 20:10:01 jupjep Modules linked in: iptable_nat nf_nat iptable_filter ip_tables x_tables nfsd exportfs lockd nfs_acl sunrpc nf_conntrack_ipv4 nf_conntrack nfnetlink ohci_hcd ehci_hcd usbcore k8tem |
57 |
Sep 11 20:10:01 jupjep Pid: 9762:#242, comm: run-crons Not tainted 2.6.20-vs2.3.0.11-gentoo #1 |
58 |
Sep 11 20:10:01 jupjep RIP: 0010:[<ffffffff8109a24b>] [<ffffffff8109a24b>] free_vx_info+0xf/0x8d |
59 |
Sep 11 20:10:01 jupjep RSP: 0018:ffff81021d825de8 EFLAGS: 00010246 |
60 |
Sep 11 20:10:01 jupjep RAX: 0000000000000001 RBX: ffff81040ac6a000 RCX: ffff81041e893290 |
61 |
Sep 11 20:10:01 jupjep RDX: ffff81041d582d48 RSI: 0000000000000000 RDI: ffff81040ac6a000 |
62 |
Sep 11 20:10:01 jupjep RBP: ffff8103b4e9e000 R08: ffff81021d824000 R09: 00000000019f865c |
63 |
Sep 11 20:10:01 jupjep R10: 0000000000000080 R11: ffff810001760400 R12: ffff81040fc4bac0 |
64 |
Sep 11 20:10:01 jupjep R13: ffff81040fc4bac0 R14: ffff81000175f000 R15: 0000000000000000 |
65 |
Sep 11 20:10:01 jupjep FS: 00002ac6eb29fda0(0000) GS:ffff8104118a7ac0(0000) knlGS:0000000056502b90 |
66 |
Sep 11 20:10:01 jupjep CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b |
67 |
Sep 11 20:10:01 jupjep CR2: 00002ae879267570 CR3: 000000032a8bf000 CR4: 00000000000006e0 |
68 |
Sep 11 20:10:01 jupjep Process run-crons (pid: 9762[#242], threadinfo ffff81021d824000, task ffff810409954100) |
69 |
Sep 11 20:10:01 jupjep Stack: ffff81041e893290 ffffffff81086c39 0000000000000100 ffff81021d825ed8 |
70 |
Sep 11 20:10:01 jupjep 0000000000000003 ffffffff810626dd 0000000000000000 ffffffff81022504 |
71 |
Sep 11 20:10:01 jupjep ffff8103e4c38978 ffff81040981f880 0000000000000006 ffff810409954100 |
72 |
Sep 11 20:10:01 jupjep Call Trace: |
73 |
Sep 11 20:10:01 jupjep [<ffffffff81086c39>] __mmdrop+0xb0/0xc3 |
74 |
Sep 11 20:10:01 jupjep [<ffffffff810626dd>] thread_return+0x68/0x118 |
75 |
Sep 11 20:10:01 jupjep [<ffffffff81022504>] __up_read+0x13/0x8a |
76 |
Sep 11 20:10:01 jupjep [<ffffffff81028613>] do_wait+0x978/0xa78 |
77 |
Sep 11 20:10:01 jupjep [<ffffffff81084940>] default_wake_function+0x0/0xe |
78 |
Sep 11 20:10:01 jupjep [<ffffffff8105d11e>] system_call+0x7e/0x83 |
79 |
Sep 11 20:10:01 jupjep |
80 |
Sep 11 20:10:01 jupjep |
81 |
|
82 |
|
83 |
Sep 11 20:10:01 jupjep Code: 0f 0b eb fe 83 7f 14 00 74 04 0f 0b eb fe 83 7f 18 00 74 04 |
84 |
Sep 11 20:10:01 jupjep RIP [<ffffffff8109a24b>] free_vx_info+0xf/0x8d |
85 |
Sep 11 20:10:01 jupjep RSP <ffff81021d825de8> |
86 |
Sep 11 20:10:01 jupjep <0>------------[ cut here ]------------ |
87 |
Sep 11 20:10:01 jupjep kernel BUG at kernel/vserver/context.c:193! |
88 |
Sep 11 20:10:01 jupjep invalid opcode: 0000 [3] SMP |
89 |
Sep 11 20:10:01 jupjep CPU 2 |
90 |
Sep 11 20:10:01 jupjep Modules linked in: iptable_nat nf_nat iptable_filter ip_tables x_tables nfsd exportfs lockd nfs_acl sunrpc nf_conntrack_ipv4 nf_conntrack nfnetlink ohci_hcd ehci_hcd usbcore k8tem |
91 |
Sep 11 20:10:01 jupjep Pid: 8581:#260, comm: server_linux Not tainted 2.6.20-vs2.3.0.11-gentoo #1 |
92 |
Sep 11 20:10:01 jupjep RIP: 0010:[<ffffffff8109a24b>] [<ffffffff8109a24b>] free_vx_info+0xf/0x8d |
93 |
Sep 11 20:10:01 jupjep RSP: 0018:ffff8103ef0a5a48 EFLAGS: 00210246 |
94 |
Sep 11 20:10:01 jupjep RAX: 0000000000000001 RBX: ffff81040ac6a000 RCX: ffff81041ca4e9c8 |
95 |
Sep 11 20:10:01 jupjep RDX: ffff81041d911838 RSI: 0000000000000000 RDI: ffff81040ac6a000 |
96 |
Sep 11 20:10:01 jupjep RBP: ffff81032a8bf000 R08: ffff8103ef0a4000 R09: ffff81021d825e88 |
97 |
Sep 11 20:10:01 jupjep R10: 0000000000002623 R11: 00000000ffffffff R12: ffff81040981f880 |
98 |
Sep 11 20:10:01 jupjep R13: ffff81040981f880 R14: ffff810001755d00 R15: ffffffff815eaeb0 |
99 |
Sep 11 20:10:01 jupjep FS: 00002ac6eb29fda0(0000) GS:ffff8104118783c0(0063) knlGS:00000000558f9b90 |
100 |
Sep 11 20:10:01 jupjep CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033 |
101 |
Sep 11 20:10:01 jupjep CR2: 00000000005c5818 CR3: 00000003ef7d9000 CR4: 00000000000006e0 |
102 |
Sep 11 20:10:01 jupjep Process server_linux (pid: 8581[#260], threadinfo ffff8103ef0a4000, task ffff810409171040) |
103 |
Sep 11 20:10:01 jupjep Stack: ffff81041ca4e9c8 ffffffff81086c39 0000000000000000 ffff8103ef0a5b38 |
104 |
Sep 11 20:10:01 jupjep 0000000000000002 ffffffff810626dd 0000000000000000 0000000000000000 |
105 |
Sep 11 20:10:01 jupjep 0000000000200246 ffff8103f3199280 000000000000000a ffff810409171040 |
106 |
Sep 11 20:10:01 jupjep Call Trace: |
107 |
Sep 11 20:10:01 jupjep [<ffffffff81086c39>] __mmdrop+0xb0/0xc3 |
108 |
Sep 11 20:10:01 jupjep [<ffffffff810626dd>] thread_return+0x68/0x118 |
109 |
Sep 11 20:10:01 jupjep [<ffffffff8106301e>] schedule_timeout+0x8a/0xad |
110 |
Sep 11 20:10:01 jupjep [<ffffffff8108d5c6>] process_timeout+0x0/0x5 |
111 |
Sep 11 20:10:01 jupjep [<ffffffff8102ed6f>] do_sys_poll+0x278/0x360 |
112 |
Sep 11 20:10:01 jupjep [<ffffffff8101e5f5>] __pollwait+0x0/0xe2 |
113 |
Sep 11 20:10:01 jupjep [<ffffffff81084940>] default_wake_function+0x0/0xe |
114 |
Sep 11 20:10:01 jupjep [<ffffffff81067b68>] __switch_to+0x26e/0x27d |
115 |
Sep 11 20:10:01 jupjep [<ffffffff81062675>] thread_return+0x0/0x118 |
116 |
Sep 11 20:10:01 jupjep [<ffffffff81034d06>] find_extend_vma+0x16/0x59 |
117 |
Sep 11 20:10:01 jupjep [<ffffffff810a1354>] get_futex_key+0x47/0x10c |
118 |
Sep 11 20:10:01 jupjep [<ffffffff81022504>] __up_read+0x13/0x8a |
119 |
Sep 11 20:10:01 jupjep [<ffffffff810a180c>] futex_wake+0xc6/0xd5 |
120 |
Sep 11 20:10:01 jupjep [<ffffffff8103dd0e>] do_futex+0x268/0xc16 |
121 |
Sep 11 20:10:01 jupjep [<ffffffff8100aec3>] do_page_fault+0x45e/0x7b9 |
122 |
Sep 11 20:10:01 jupjep [<ffffffff81084940>] default_wake_function+0x0/0xe |
123 |
Sep 11 20:10:01 jupjep [<ffffffff81062675>] thread_return+0x0/0x118 |
124 |
Sep 11 20:10:01 jupjep [<ffffffff810a27ff>] compat_sys_futex+0xfb/0x119 |
125 |
Sep 11 20:10:01 jupjep [<ffffffff8104aa34>] sys_poll+0x54/0x5a |
126 |
Sep 11 20:10:01 jupjep [<ffffffff81060b44>] cstar_do_call+0x1b/0x65 |
127 |
Sep 11 20:10:01 jupjep |
128 |
Sep 11 20:10:01 jupjep |
129 |
Sep 11 20:10:01 jupjep Code: 0f 0b eb fe 83 7f 14 00 74 04 0f 0b eb fe 83 7f 18 00 74 04 |
130 |
Sep 11 20:10:01 jupjep RIP [<ffffffff8109a24b>] free_vx_info+0xf/0x8d |
131 |
Sep 11 20:10:01 jupjep RSP <ffff8103ef0a5a48> |
132 |
|
133 |
|
134 |
Sep 11 20:10:01 jupjep <0>------------[ cut here ]------------ |
135 |
Sep 11 20:10:01 jupjep kernel BUG at kernel/vserver/context.c:193! |
136 |
Sep 11 20:10:01 jupjep invalid opcode: 0000 [4] SMP |
137 |
Sep 11 20:10:01 jupjep CPU 3 |
138 |
Sep 11 20:10:01 jupjep Modules linked in: iptable_nat nf_nat iptable_filter ip_tables x_tables nfsd exportfs lockd nfs_acl sunrpc nf_conntrack_ipv4 nf_conntrack nfnetlink ohci_hcd ehci_hcd usbcore k8tem |
139 |
Sep 11 20:10:01 jupjep Pid: 9764:#242, comm: sendmail Not tainted 2.6.20-vs2.3.0.11-gentoo #1 |
140 |
Sep 11 20:10:01 jupjep RIP: 0010:[<ffffffff8109a24b>] [<ffffffff8109a24b>] free_vx_info+0xf/0x8d |
141 |
Sep 11 20:10:01 jupjep RSP: 0018:ffff8103b35dbe88 EFLAGS: 00010246 |
142 |
Sep 11 20:10:01 jupjep RAX: ffff81040ac6a001 RBX: ffff81040ac6a000 RCX: 0000000000000000 |
143 |
Sep 11 20:10:01 jupjep RDX: 0000000000000000 RSI: 0000000000000286 RDI: ffff81040ac6a000 |
144 |
Sep 11 20:10:01 jupjep RBP: 0000000000000000 R08: ffffffff815a8718 R09: ffffffff00000000 |
145 |
Sep 11 20:10:01 jupjep R10: 0000000000000296 R11: 0000000000000202 R12: ffff8102e3414080 |
146 |
Sep 11 20:10:01 jupjep R13: ffff8101c52115f8 R14: ffff81041183b980 R15: 0000000000002624 |
147 |
Sep 11 20:10:01 jupjep FS: 00002b8330167ae0(0000) GS:ffff8104118a7ac0(0000) knlGS:0000000056502b90 |
148 |
Sep 11 20:10:01 jupjep CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b |
149 |
Sep 11 20:10:01 jupjep CR2: 00002b832fc91db0 CR3: 000000036083c000 CR4: 00000000000006e0 |
150 |
Sep 11 20:10:01 jupjep Process sendmail (pid: 9764[#242], threadinfo ffff8103b35da000, task ffff810339081790) |
151 |
Sep 11 20:10:01 jupjep Stack: ffff81040f8af800 ffffffff8132c56c ffff8102e3414080 ffff81040f8af960 |
152 |
Sep 11 20:10:01 jupjep ffff81040f8af800 ffffffff813458ba 0000000000002624 ffffffff81053352 |
153 |
Sep 11 20:10:01 jupjep 0000000000000000 ffff8102e3414080 ffff8102e34140d0 ffffffff810531d5 |
154 |
Sep 11 20:10:01 jupjep Call Trace: |
155 |
Sep 11 20:10:01 jupjep [<ffffffff8132c56c>] sk_free+0xd9/0x133 |
156 |
Sep 11 20:10:01 jupjep [<ffffffff813458ba>] netlink_release+0x255/0x25f |
157 |
Sep 11 20:10:01 jupjep [<ffffffff81053352>] sock_fasync+0x124/0x133 |
158 |
Sep 11 20:10:01 jupjep [<ffffffff810531d5>] sock_release+0x19/0x72 |
159 |
Sep 11 20:10:01 jupjep [<ffffffff8105338d>] sock_close+0x2c/0x30 |
160 |
Sep 11 20:10:01 jupjep [<ffffffff81012992>] __fput+0xa1/0x19a |
161 |
Sep 11 20:10:01 jupjep [<ffffffff8102408d>] filp_close+0x5d/0x65 |
162 |
Sep 11 20:10:01 jupjep [<ffffffff8101da52>] sys_close+0x8c/0xcf |
163 |
Sep 11 20:10:01 jupjep [<ffffffff8105d11e>] system_call+0x7e/0x83 |
164 |
Sep 11 20:10:01 jupjep |
165 |
Sep 11 20:10:01 jupjep |
166 |
Sep 11 20:10:01 jupjep Code: 0f 0b eb fe 83 7f 14 00 74 04 0f 0b eb fe 83 7f 18 00 74 04 |
167 |
Sep 11 20:10:01 jupjep RIP [<ffffffff8109a24b>] free_vx_info+0xf/0x8d |
168 |
Sep 11 20:10:01 jupjep RSP <ffff8103b35dbe88> |
169 |
|
170 |
a little later the server crashed. |
171 |
|
172 |
Now, since i've set sysctl kernel.panic=5 i don't see any of these logs nor crashes, but dozens of reboots |
173 |
|
174 |
jupjep log # last | grep -E '(boot|crash)' |
175 |
reboot system boot Tue Sep 18 11:27 (01:24) 2.6.22-vs2.3.0.17-gentoo |
176 |
reboot system boot Mon Sep 17 22:23 (14:29) 2.6.22-vs2.3.0.17-gentoo |
177 |
reboot system boot Mon Sep 17 19:52 (16:59) 2.6.22-vs2.3.0.17-gentoo |
178 |
reboot system boot Sun Sep 16 18:57 (1+17:55) 2.6.22-vs2.3.0.17-gentoo |
179 |
trapni pts/0 Sat Sep 15 23:58 - crash (18:58) $MY_IP |
180 |
reboot system boot Fri Sep 14 22:06 (3+14:46) 2.6.22-vs2.3.0.17-gentoo |
181 |
trapni pts/4 Fri Sep 14 13:39 - crash (08:27) $MY_IP |
182 |
reboot system boot Thu Sep 13 19:34 (4+17:18) 2.6.20-vs2.3.0.11-gentoo |
183 |
trapni pts/6 Thu Sep 13 12:38 - crash (06:56) $MY_IP |
184 |
trapni pts/0 Thu Sep 13 08:17 - crash (11:16) $MY_IP |
185 |
reboot system boot Thu Sep 13 08:02 (5+04:50) 2.6.20-vs2.3.0.11-gentoo |
186 |
reboot system boot Tue Sep 11 19:32 (6+17:19) 2.6.20-vs2.3.0.11-gentoo |
187 |
reboot system boot Mon Sep 10 17:52 (7+18:59) 2.6.20-vs2.3.0.11-gentoo |
188 |
reboot system boot Thu Sep 6 16:50 (11+20:02) 2.6.20-vs2.3.0.11-gentoo |
189 |
reboot system boot Fri Aug 3 08:36 (46+04:16) 2.6.20-vs2.3.0.11-gentoo |
190 |
reboot system boot Mon Jul 30 21:31 (3+11:01) 2.6.20-vs2.2.0-gentoo |
191 |
trapni pts/13 Mon Jul 30 12:26 - crash (09:05) $MY_IP |
192 |
|
193 |
These system boots were not caused by me, so these were all crashes. |
194 |
|
195 |
Well now, is there a way to trace this bug and/or to work around? |
196 |
In fact, we've about 10+ mashines of the very same hardware running gentoo |
197 |
hardened profile and a hardened-sources kernel. |
198 |
But this one host running normal gentoo with vserver-sources really fails get |
199 |
get me friendly. |
200 |
|
201 |
Can anybody give me a hint regarding these traces I posted above? |
202 |
|
203 |
Many thanks in advance, |
204 |
Christian Parpart. |
205 |
|
206 |
|
207 |
|
208 |
|
209 |
|
210 |
|
211 |
|
212 |
-- |
213 |
gentoo-server@g.o mailing list |