Gentoo Archives: gentoo-server

From: Christian Parpart <trapni@g.o>
To: gentoo-server@l.g.o
Subject: [gentoo-server] lots of crashes recently (vserver)
Date: Tue, 18 Sep 2007 12:16:38
Message-Id: 200709181406.45614.trapni@gentoo.org
1 Hi all,
2
3 well, hard to explain, but recently I encounter *lots* of crashes on
4 one mashine serving via vserver-sources with about 3-5 VEs.
5
6 The physical server itself is a 10k IBM amd64 host with 2 dual cores
7 and 16GB RAM, raid10 SATA, just if you want to know.
8
9 However, this is what I found in the logs:
10
11
12 Sep 11 20:05:11 jupjep ------------[ cut here ]------------
13 Sep 11 20:05:11 jupjep kernel BUG at kernel/vserver/context.c:193!
14 Sep 11 20:05:11 jupjep invalid opcode: 0000 [1] SMP
15 Sep 11 20:05:11 jupjep CPU 2
16 Sep 11 20:05:11 jupjep Modules linked in: iptable_nat nf_nat iptable_filter ip_tables x_tables nfsd exportfs lockd nfs_acl sunrpc nf_conntrack_ipv4 nf_conntrack nfnetlink ohci_hcd ehci_hcd usbcore k8tem
17 Sep 11 20:05:11 jupjep Pid: 26337:#242, comm: sshd Not tainted 2.6.20-vs2.3.0.11-gentoo #1
18 Sep 11 20:05:11 jupjep RIP: 0010:[<ffffffff8109a24b>] [<ffffffff8109a24b>] free_vx_info+0xf/0x8d
19 Sep 11 20:05:11 jupjep RSP: 0018:ffff8101363a1dd8 EFLAGS: 00010246
20 Sep 11 20:05:11 jupjep RAX: ffff81040ac6a001 RBX: ffff81040ac6a000 RCX: 0000000000000000
21 Sep 11 20:05:11 jupjep RDX: 0000000000000000 RSI: 0000000000000286 RDI: ffff81040ac6a000
22 Sep 11 20:05:11 jupjep RBP: 0000000000000000 R08: ffff8103ff3bcc58 R09: ffffffff00000000
23 Sep 11 20:05:11 jupjep R10: 0000000000000000 R11: ffffffff8132a2e2 R12: 0000000000000000
24 Sep 11 20:05:11 jupjep R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000001
25 Sep 11 20:05:11 jupjep FS: 00002b988b369e70(0000) GS:ffff8104118783c0(0000) knlGS:0000000056502b90
26 Sep 11 20:05:11 jupjep CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
27 Sep 11 20:05:11 jupjep CR2: 00002b988c005f80 CR3: 0000000001001000 CR4: 00000000000006e0
28 Sep 11 20:05:11 jupjep Process sshd (pid: 26337[#242], threadinfo ffff8101363a0000, task ffff810302a70040)
29 Sep 11 20:05:11 jupjep Stack: ffff8103ff3bcf40 ffffffff8132c56c 0000000000000000 ffff8103ff3bcf40
30 Sep 11 20:05:11 jupjep ffff8103ff3bc9c0 ffffffff8104ff06 ffff8103ff3bc9c0 0000000000000000
31 Sep 11 20:05:11 jupjep ffff8103ca88db80 ffff8103ca88dbd0 ffff8101051924a8 ffff81041183b980
32 Sep 11 20:05:11 jupjep Call Trace:
33 Sep 11 20:05:11 jupjep [<ffffffff8132c56c>] sk_free+0xd9/0x133
34 Sep 11 20:05:11 jupjep [<ffffffff8104ff06>] unix_release_sock+0x172/0x202
35 Sep 11 20:05:11 jupjep [<ffffffff810531d5>] sock_release+0x19/0x72
36 Sep 11 20:05:11 jupjep [<ffffffff8105338d>] sock_close+0x2c/0x30
37 Sep 11 20:05:11 jupjep [<ffffffff81012992>] __fput+0xa1/0x19a
38 Sep 11 20:05:11 jupjep [<ffffffff8102408d>] filp_close+0x5d/0x65
39 Sep 11 20:05:11 jupjep [<ffffffff810381d6>] put_files_struct+0x66/0xe1
40 Sep 11 20:05:11 jupjep [<ffffffff81015452>] do_exit+0x264/0x8de
41 Sep 11 20:05:11 jupjep [<ffffffff81047aec>] cpuset_exit+0x0/0x6b
42 Sep 11 20:05:11 jupjep [<ffffffff8105d11e>] system_call+0x7e/0x83
43 Sep 11 20:05:11 jupjep
44 Sep 11 20:05:11 jupjep
45 Sep 11 20:05:11 jupjep Code: 0f 0b eb fe 83 7f 14 00 74 04 0f 0b eb fe 83 7f 18 00 74 04
46 Sep 11 20:05:11 jupjep RIP [<ffffffff8109a24b>] free_vx_info+0xf/0x8d
47 Sep 11 20:05:11 jupjep RSP <ffff8101363a1dd8>
48 Sep 11 20:05:11 jupjep <1>Fixing recursive fault but reboot is needed!
49
50 just a moment later:
51
52 Sep 11 20:10:01 jupjep ------------[ cut here ]------------
53 Sep 11 20:10:01 jupjep kernel BUG at kernel/vserver/context.c:193!
54 Sep 11 20:10:01 jupjep invalid opcode: 0000 [2] SMP
55 Sep 11 20:10:01 jupjep CPU 3
56 Sep 11 20:10:01 jupjep Modules linked in: iptable_nat nf_nat iptable_filter ip_tables x_tables nfsd exportfs lockd nfs_acl sunrpc nf_conntrack_ipv4 nf_conntrack nfnetlink ohci_hcd ehci_hcd usbcore k8tem
57 Sep 11 20:10:01 jupjep Pid: 9762:#242, comm: run-crons Not tainted 2.6.20-vs2.3.0.11-gentoo #1
58 Sep 11 20:10:01 jupjep RIP: 0010:[<ffffffff8109a24b>] [<ffffffff8109a24b>] free_vx_info+0xf/0x8d
59 Sep 11 20:10:01 jupjep RSP: 0018:ffff81021d825de8 EFLAGS: 00010246
60 Sep 11 20:10:01 jupjep RAX: 0000000000000001 RBX: ffff81040ac6a000 RCX: ffff81041e893290
61 Sep 11 20:10:01 jupjep RDX: ffff81041d582d48 RSI: 0000000000000000 RDI: ffff81040ac6a000
62 Sep 11 20:10:01 jupjep RBP: ffff8103b4e9e000 R08: ffff81021d824000 R09: 00000000019f865c
63 Sep 11 20:10:01 jupjep R10: 0000000000000080 R11: ffff810001760400 R12: ffff81040fc4bac0
64 Sep 11 20:10:01 jupjep R13: ffff81040fc4bac0 R14: ffff81000175f000 R15: 0000000000000000
65 Sep 11 20:10:01 jupjep FS: 00002ac6eb29fda0(0000) GS:ffff8104118a7ac0(0000) knlGS:0000000056502b90
66 Sep 11 20:10:01 jupjep CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
67 Sep 11 20:10:01 jupjep CR2: 00002ae879267570 CR3: 000000032a8bf000 CR4: 00000000000006e0
68 Sep 11 20:10:01 jupjep Process run-crons (pid: 9762[#242], threadinfo ffff81021d824000, task ffff810409954100)
69 Sep 11 20:10:01 jupjep Stack: ffff81041e893290 ffffffff81086c39 0000000000000100 ffff81021d825ed8
70 Sep 11 20:10:01 jupjep 0000000000000003 ffffffff810626dd 0000000000000000 ffffffff81022504
71 Sep 11 20:10:01 jupjep ffff8103e4c38978 ffff81040981f880 0000000000000006 ffff810409954100
72 Sep 11 20:10:01 jupjep Call Trace:
73 Sep 11 20:10:01 jupjep [<ffffffff81086c39>] __mmdrop+0xb0/0xc3
74 Sep 11 20:10:01 jupjep [<ffffffff810626dd>] thread_return+0x68/0x118
75 Sep 11 20:10:01 jupjep [<ffffffff81022504>] __up_read+0x13/0x8a
76 Sep 11 20:10:01 jupjep [<ffffffff81028613>] do_wait+0x978/0xa78
77 Sep 11 20:10:01 jupjep [<ffffffff81084940>] default_wake_function+0x0/0xe
78 Sep 11 20:10:01 jupjep [<ffffffff8105d11e>] system_call+0x7e/0x83
79 Sep 11 20:10:01 jupjep
80 Sep 11 20:10:01 jupjep
81
82
83 Sep 11 20:10:01 jupjep Code: 0f 0b eb fe 83 7f 14 00 74 04 0f 0b eb fe 83 7f 18 00 74 04
84 Sep 11 20:10:01 jupjep RIP [<ffffffff8109a24b>] free_vx_info+0xf/0x8d
85 Sep 11 20:10:01 jupjep RSP <ffff81021d825de8>
86 Sep 11 20:10:01 jupjep <0>------------[ cut here ]------------
87 Sep 11 20:10:01 jupjep kernel BUG at kernel/vserver/context.c:193!
88 Sep 11 20:10:01 jupjep invalid opcode: 0000 [3] SMP
89 Sep 11 20:10:01 jupjep CPU 2
90 Sep 11 20:10:01 jupjep Modules linked in: iptable_nat nf_nat iptable_filter ip_tables x_tables nfsd exportfs lockd nfs_acl sunrpc nf_conntrack_ipv4 nf_conntrack nfnetlink ohci_hcd ehci_hcd usbcore k8tem
91 Sep 11 20:10:01 jupjep Pid: 8581:#260, comm: server_linux Not tainted 2.6.20-vs2.3.0.11-gentoo #1
92 Sep 11 20:10:01 jupjep RIP: 0010:[<ffffffff8109a24b>] [<ffffffff8109a24b>] free_vx_info+0xf/0x8d
93 Sep 11 20:10:01 jupjep RSP: 0018:ffff8103ef0a5a48 EFLAGS: 00210246
94 Sep 11 20:10:01 jupjep RAX: 0000000000000001 RBX: ffff81040ac6a000 RCX: ffff81041ca4e9c8
95 Sep 11 20:10:01 jupjep RDX: ffff81041d911838 RSI: 0000000000000000 RDI: ffff81040ac6a000
96 Sep 11 20:10:01 jupjep RBP: ffff81032a8bf000 R08: ffff8103ef0a4000 R09: ffff81021d825e88
97 Sep 11 20:10:01 jupjep R10: 0000000000002623 R11: 00000000ffffffff R12: ffff81040981f880
98 Sep 11 20:10:01 jupjep R13: ffff81040981f880 R14: ffff810001755d00 R15: ffffffff815eaeb0
99 Sep 11 20:10:01 jupjep FS: 00002ac6eb29fda0(0000) GS:ffff8104118783c0(0063) knlGS:00000000558f9b90
100 Sep 11 20:10:01 jupjep CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033
101 Sep 11 20:10:01 jupjep CR2: 00000000005c5818 CR3: 00000003ef7d9000 CR4: 00000000000006e0
102 Sep 11 20:10:01 jupjep Process server_linux (pid: 8581[#260], threadinfo ffff8103ef0a4000, task ffff810409171040)
103 Sep 11 20:10:01 jupjep Stack: ffff81041ca4e9c8 ffffffff81086c39 0000000000000000 ffff8103ef0a5b38
104 Sep 11 20:10:01 jupjep 0000000000000002 ffffffff810626dd 0000000000000000 0000000000000000
105 Sep 11 20:10:01 jupjep 0000000000200246 ffff8103f3199280 000000000000000a ffff810409171040
106 Sep 11 20:10:01 jupjep Call Trace:
107 Sep 11 20:10:01 jupjep [<ffffffff81086c39>] __mmdrop+0xb0/0xc3
108 Sep 11 20:10:01 jupjep [<ffffffff810626dd>] thread_return+0x68/0x118
109 Sep 11 20:10:01 jupjep [<ffffffff8106301e>] schedule_timeout+0x8a/0xad
110 Sep 11 20:10:01 jupjep [<ffffffff8108d5c6>] process_timeout+0x0/0x5
111 Sep 11 20:10:01 jupjep [<ffffffff8102ed6f>] do_sys_poll+0x278/0x360
112 Sep 11 20:10:01 jupjep [<ffffffff8101e5f5>] __pollwait+0x0/0xe2
113 Sep 11 20:10:01 jupjep [<ffffffff81084940>] default_wake_function+0x0/0xe
114 Sep 11 20:10:01 jupjep [<ffffffff81067b68>] __switch_to+0x26e/0x27d
115 Sep 11 20:10:01 jupjep [<ffffffff81062675>] thread_return+0x0/0x118
116 Sep 11 20:10:01 jupjep [<ffffffff81034d06>] find_extend_vma+0x16/0x59
117 Sep 11 20:10:01 jupjep [<ffffffff810a1354>] get_futex_key+0x47/0x10c
118 Sep 11 20:10:01 jupjep [<ffffffff81022504>] __up_read+0x13/0x8a
119 Sep 11 20:10:01 jupjep [<ffffffff810a180c>] futex_wake+0xc6/0xd5
120 Sep 11 20:10:01 jupjep [<ffffffff8103dd0e>] do_futex+0x268/0xc16
121 Sep 11 20:10:01 jupjep [<ffffffff8100aec3>] do_page_fault+0x45e/0x7b9
122 Sep 11 20:10:01 jupjep [<ffffffff81084940>] default_wake_function+0x0/0xe
123 Sep 11 20:10:01 jupjep [<ffffffff81062675>] thread_return+0x0/0x118
124 Sep 11 20:10:01 jupjep [<ffffffff810a27ff>] compat_sys_futex+0xfb/0x119
125 Sep 11 20:10:01 jupjep [<ffffffff8104aa34>] sys_poll+0x54/0x5a
126 Sep 11 20:10:01 jupjep [<ffffffff81060b44>] cstar_do_call+0x1b/0x65
127 Sep 11 20:10:01 jupjep
128 Sep 11 20:10:01 jupjep
129 Sep 11 20:10:01 jupjep Code: 0f 0b eb fe 83 7f 14 00 74 04 0f 0b eb fe 83 7f 18 00 74 04
130 Sep 11 20:10:01 jupjep RIP [<ffffffff8109a24b>] free_vx_info+0xf/0x8d
131 Sep 11 20:10:01 jupjep RSP <ffff8103ef0a5a48>
132
133
134 Sep 11 20:10:01 jupjep <0>------------[ cut here ]------------
135 Sep 11 20:10:01 jupjep kernel BUG at kernel/vserver/context.c:193!
136 Sep 11 20:10:01 jupjep invalid opcode: 0000 [4] SMP
137 Sep 11 20:10:01 jupjep CPU 3
138 Sep 11 20:10:01 jupjep Modules linked in: iptable_nat nf_nat iptable_filter ip_tables x_tables nfsd exportfs lockd nfs_acl sunrpc nf_conntrack_ipv4 nf_conntrack nfnetlink ohci_hcd ehci_hcd usbcore k8tem
139 Sep 11 20:10:01 jupjep Pid: 9764:#242, comm: sendmail Not tainted 2.6.20-vs2.3.0.11-gentoo #1
140 Sep 11 20:10:01 jupjep RIP: 0010:[<ffffffff8109a24b>] [<ffffffff8109a24b>] free_vx_info+0xf/0x8d
141 Sep 11 20:10:01 jupjep RSP: 0018:ffff8103b35dbe88 EFLAGS: 00010246
142 Sep 11 20:10:01 jupjep RAX: ffff81040ac6a001 RBX: ffff81040ac6a000 RCX: 0000000000000000
143 Sep 11 20:10:01 jupjep RDX: 0000000000000000 RSI: 0000000000000286 RDI: ffff81040ac6a000
144 Sep 11 20:10:01 jupjep RBP: 0000000000000000 R08: ffffffff815a8718 R09: ffffffff00000000
145 Sep 11 20:10:01 jupjep R10: 0000000000000296 R11: 0000000000000202 R12: ffff8102e3414080
146 Sep 11 20:10:01 jupjep R13: ffff8101c52115f8 R14: ffff81041183b980 R15: 0000000000002624
147 Sep 11 20:10:01 jupjep FS: 00002b8330167ae0(0000) GS:ffff8104118a7ac0(0000) knlGS:0000000056502b90
148 Sep 11 20:10:01 jupjep CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
149 Sep 11 20:10:01 jupjep CR2: 00002b832fc91db0 CR3: 000000036083c000 CR4: 00000000000006e0
150 Sep 11 20:10:01 jupjep Process sendmail (pid: 9764[#242], threadinfo ffff8103b35da000, task ffff810339081790)
151 Sep 11 20:10:01 jupjep Stack: ffff81040f8af800 ffffffff8132c56c ffff8102e3414080 ffff81040f8af960
152 Sep 11 20:10:01 jupjep ffff81040f8af800 ffffffff813458ba 0000000000002624 ffffffff81053352
153 Sep 11 20:10:01 jupjep 0000000000000000 ffff8102e3414080 ffff8102e34140d0 ffffffff810531d5
154 Sep 11 20:10:01 jupjep Call Trace:
155 Sep 11 20:10:01 jupjep [<ffffffff8132c56c>] sk_free+0xd9/0x133
156 Sep 11 20:10:01 jupjep [<ffffffff813458ba>] netlink_release+0x255/0x25f
157 Sep 11 20:10:01 jupjep [<ffffffff81053352>] sock_fasync+0x124/0x133
158 Sep 11 20:10:01 jupjep [<ffffffff810531d5>] sock_release+0x19/0x72
159 Sep 11 20:10:01 jupjep [<ffffffff8105338d>] sock_close+0x2c/0x30
160 Sep 11 20:10:01 jupjep [<ffffffff81012992>] __fput+0xa1/0x19a
161 Sep 11 20:10:01 jupjep [<ffffffff8102408d>] filp_close+0x5d/0x65
162 Sep 11 20:10:01 jupjep [<ffffffff8101da52>] sys_close+0x8c/0xcf
163 Sep 11 20:10:01 jupjep [<ffffffff8105d11e>] system_call+0x7e/0x83
164 Sep 11 20:10:01 jupjep
165 Sep 11 20:10:01 jupjep
166 Sep 11 20:10:01 jupjep Code: 0f 0b eb fe 83 7f 14 00 74 04 0f 0b eb fe 83 7f 18 00 74 04
167 Sep 11 20:10:01 jupjep RIP [<ffffffff8109a24b>] free_vx_info+0xf/0x8d
168 Sep 11 20:10:01 jupjep RSP <ffff8103b35dbe88>
169
170 a little later the server crashed.
171
172 Now, since i've set sysctl kernel.panic=5 i don't see any of these logs nor crashes, but dozens of reboots
173
174 jupjep log # last | grep -E '(boot|crash)'
175 reboot system boot Tue Sep 18 11:27 (01:24) 2.6.22-vs2.3.0.17-gentoo
176 reboot system boot Mon Sep 17 22:23 (14:29) 2.6.22-vs2.3.0.17-gentoo
177 reboot system boot Mon Sep 17 19:52 (16:59) 2.6.22-vs2.3.0.17-gentoo
178 reboot system boot Sun Sep 16 18:57 (1+17:55) 2.6.22-vs2.3.0.17-gentoo
179 trapni pts/0 Sat Sep 15 23:58 - crash (18:58) $MY_IP
180 reboot system boot Fri Sep 14 22:06 (3+14:46) 2.6.22-vs2.3.0.17-gentoo
181 trapni pts/4 Fri Sep 14 13:39 - crash (08:27) $MY_IP
182 reboot system boot Thu Sep 13 19:34 (4+17:18) 2.6.20-vs2.3.0.11-gentoo
183 trapni pts/6 Thu Sep 13 12:38 - crash (06:56) $MY_IP
184 trapni pts/0 Thu Sep 13 08:17 - crash (11:16) $MY_IP
185 reboot system boot Thu Sep 13 08:02 (5+04:50) 2.6.20-vs2.3.0.11-gentoo
186 reboot system boot Tue Sep 11 19:32 (6+17:19) 2.6.20-vs2.3.0.11-gentoo
187 reboot system boot Mon Sep 10 17:52 (7+18:59) 2.6.20-vs2.3.0.11-gentoo
188 reboot system boot Thu Sep 6 16:50 (11+20:02) 2.6.20-vs2.3.0.11-gentoo
189 reboot system boot Fri Aug 3 08:36 (46+04:16) 2.6.20-vs2.3.0.11-gentoo
190 reboot system boot Mon Jul 30 21:31 (3+11:01) 2.6.20-vs2.2.0-gentoo
191 trapni pts/13 Mon Jul 30 12:26 - crash (09:05) $MY_IP
192
193 These system boots were not caused by me, so these were all crashes.
194
195 Well now, is there a way to trace this bug and/or to work around?
196 In fact, we've about 10+ mashines of the very same hardware running gentoo
197 hardened profile and a hardened-sources kernel.
198 But this one host running normal gentoo with vserver-sources really fails get
199 get me friendly.
200
201 Can anybody give me a hint regarding these traces I posted above?
202
203 Many thanks in advance,
204 Christian Parpart.
205
206
207
208
209
210
211
212 --
213 gentoo-server@g.o mailing list

Replies

Subject Author
Re: [gentoo-server] lots of crashes recently (vserver) Benny Pedersen <me@××××.org>
Re: [gentoo-server] lots of crashes recently (vserver) Steve Dommett <steve@×××××.net>
Re: [gentoo-server] lots of crashes recently (vserver) Dan Noe <dpn@×××××××××.net>