1 |
Last week, the LMTP daemon on our mail server (HP DL360 G6) crashed. |
2 |
People noticed that the mail stopped coming in, so I SSHed in to check |
3 |
on it, and there were some weird traces in the dmesg. While trying to |
4 |
investigate, I noticed some more badness: |
5 |
|
6 |
# emerge -1 openntpd |
7 |
Calculating dependencies... done! |
8 |
|
9 |
>>> Verifying ebuild manifests |
10 |
Killed |
11 |
|
12 |
At that point I'm thinking, "hardware problem, there goes the weekend." |
13 |
Most of my tools are committing suicide so I surrender and reboot. The |
14 |
thing comes up fine and has been working ever since. |
15 |
|
16 |
Today, another one of our web servers (HP DL360 G5?) does the same |
17 |
thing. The nightly log report was empty, because there's no syslog |
18 |
daemon running. This morning dmesg shows: |
19 |
|
20 |
> [Fri May 9 11:00:42 2014] PAX: refcount overflow detected in: syslog-ng:21823, uid/euid: 0/0 |
21 |
> [Fri May 9 11:00:42 2014] CPU: 2 PID: 21823 Comm: syslog-ng Not tainted 3.11.7-hardened-r1 #1 |
22 |
> [Fri May 9 11:00:42 2014] task: ffff8802cffca080 ti: ffff8802cffca488 task.ti: ffff8802cffca488 |
23 |
> [Fri May 9 11:00:42 2014] RIP: 0010:[<ffffffff810e311e>] [<ffffffff810e311e>] 0xffffffff810e311e |
24 |
> [Fri May 9 11:00:42 2014] RSP: 0018:ffff880416f21c78 EFLAGS: 00000a96 |
25 |
> [Fri May 9 11:00:42 2014] RAX: ffff88041f0048a0 RBX: ffff88041a1edf00 RCX: 0000000040276333 |
26 |
> [Fri May 9 11:00:42 2014] RDX: 0000000040276332 RSI: 0000000000000000 RDI: ffff88041d858720 |
27 |
> [Fri May 9 11:00:42 2014] RBP: 0000000000000008 R08: 0000000000010bc0 R09: ffff88042fb10bc0 |
28 |
> [Fri May 9 11:00:42 2014] R10: 8000000000000000 R11: ffffea000fec3040 R12: ffff88041f0048a0 |
29 |
> [Fri May 9 11:00:42 2014] R13: ffff88026628ef00 R14: ffff88041d858720 R15: ffff88041a1edf10 |
30 |
> [Fri May 9 11:00:42 2014] FS: 0000000000000000(0000) GS:ffff88042fb00000(0000) knlGS:0000000000000000 |
31 |
> [Fri May 9 11:00:42 2014] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 |
32 |
> [Fri May 9 11:00:42 2014] CR2: 0000035fb5abf850 CR3: 000000000138a000 CR4: 00000000000006b0 |
33 |
> [Fri May 9 11:00:42 2014] Stack: |
34 |
> [Fri May 9 11:00:42 2014] 0000000000000000 ffffffff818dde60 ffff8804140ac100 ffff8802cffca570 |
35 |
> [Fri May 9 11:00:42 2014] ffff8802cffca080 ffff880416eb4200 ffff8802cffca080 ffffffff81052750 |
36 |
> [Fri May 9 11:00:42 2014] 0000000000000000 0000000000000001 ffff88038e6260d8 ffff8802cffca598 |
37 |
> [Fri May 9 11:00:42 2014] Call Trace: |
38 |
> [Fri May 9 11:00:42 2014] [<ffffffff81052750>] ? 0xffffffff81052750 |
39 |
> [Fri May 9 11:00:42 2014] [<ffffffff81036e10>] ? 0xffffffff81036e10 |
40 |
> [Fri May 9 11:00:42 2014] [<ffffffff810371e8>] ? 0xffffffff810371e8 |
41 |
> [Fri May 9 11:00:42 2014] [<ffffffff810449cc>] ? 0xffffffff810449cc |
42 |
> [Fri May 9 11:00:42 2014] [<ffffffff8100241f>] ? 0xffffffff8100241f |
43 |
> [Fri May 9 11:00:42 2014] [<ffffffff81002a89>] ? 0xffffffff81002a89 |
44 |
> [Fri May 9 11:00:42 2014] [<ffffffff8137c212>] ? 0xffffffff8137c212 |
45 |
> [Fri May 9 11:00:42 2014] Code: e9 68 fd 01 00 0f 1f 84 00 00 00 00 00 48 8b 43 18 48 8b 7b 10 48 8b 40 30 f0 ff 88 30 01 00 00 71 09 f0 ff 80 30 01 00 00 cd 04 <0f> b7 00 89 c2 66 81 e2 00 b0 66 81 fa 00 20 0f 84 53 ff ff ff |
46 |
> [Fri May 9 11:00:42 2014] PAX: refcount overflow detected in: syslog-ng:21823, uid/euid: 0/0 |
47 |
> [Fri May 9 11:00:42 2014] CPU: 2 PID: 21823 Comm: syslog-ng Not tainted 3.11.7-hardened-r1 #1 |
48 |
> [Fri May 9 11:00:42 2014] task: ffff8802cffca080 ti: ffff8802cffca488 task.ti: ffff8802cffca488 |
49 |
> [Fri May 9 11:00:42 2014] RIP: 0010:[<ffffffff810e311e>] [<ffffffff810e311e>] 0xffffffff810e311e |
50 |
> [Fri May 9 11:00:42 2014] RSP: 0018:ffff880416f21c78 EFLAGS: 00000a96 |
51 |
> [Fri May 9 11:00:42 2014] RAX: ffff88041f0048a0 RBX: ffff88041a1edc00 RCX: 0000000040c384f8 |
52 |
> [Fri May 9 11:00:42 2014] RDX: 0000000040c384f7 RSI: 0000000000000000 RDI: ffff88041d858720 |
53 |
> [Fri May 9 11:00:42 2014] RBP: 0000000000000008 R08: 0000000000010b60 R09: ffff88042fb10b60 |
54 |
> [Fri May 9 11:00:42 2014] R10: 8000000000000000 R11: ffffea000f26a840 R12: ffff88041f0048a0 |
55 |
> [Fri May 9 11:00:42 2014] R13: ffff88026628e000 R14: ffff88041d858720 R15: ffff88041a1edc10 |
56 |
> [Fri May 9 11:00:42 2014] FS: 0000000000000000(0000) GS:ffff88042fb00000(0000) knlGS:0000000000000000 |
57 |
> [Fri May 9 11:00:42 2014] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 |
58 |
> [Fri May 9 11:00:42 2014] CR2: 0000035fb5abf850 CR3: 000000000138a000 CR4: 00000000000006b0 |
59 |
> [Fri May 9 11:00:42 2014] Stack: |
60 |
> [Fri May 9 11:00:42 2014] 0000000000000000 ffffffff818dde60 ffff88041a1ed400 ffff8802cffca570 |
61 |
> [Fri May 9 11:00:42 2014] ffff8802cffca080 ffff880416eb4200 ffff8802cffca080 ffffffff81052750 |
62 |
> [Fri May 9 11:00:42 2014] 0000000000000000 0000000000000001 ffff88038e6260d8 ffff8802cffca598 |
63 |
> [Fri May 9 11:00:42 2014] Call Trace: |
64 |
> [Fri May 9 11:00:42 2014] [<ffffffff81052750>] ? 0xffffffff81052750 |
65 |
> [Fri May 9 11:00:42 2014] [<ffffffff81036e10>] ? 0xffffffff81036e10 |
66 |
> [Fri May 9 11:00:42 2014] [<ffffffff810371e8>] ? 0xffffffff810371e8 |
67 |
> [Fri May 9 11:00:42 2014] [<ffffffff810449cc>] ? 0xffffffff810449cc |
68 |
> [Fri May 9 11:00:42 2014] [<ffffffff8100241f>] ? 0xffffffff8100241f |
69 |
> [Fri May 9 11:00:42 2014] [<ffffffff81002a89>] ? 0xffffffff81002a89 |
70 |
> [Fri May 9 11:00:42 2014] [<ffffffff8137c212>] ? 0xffffffff8137c212 |
71 |
> [Fri May 9 11:00:42 2014] Code: e9 68 fd 01 00 0f 1f 84 00 00 00 00 00 48 8b 43 18 48 8b 7b 10 48 8b 40 30 f0 ff 88 30 01 00 00 71 09 f0 ff 80 30 01 00 00 cd 04 <0f> b7 00 89 c2 66 81 e2 00 b0 66 81 fa 00 20 0f 84 53 ff ff ff |
72 |
|
73 |
|
74 |
And things are segfaulting randomly. These machines have been running |
75 |
3.11.7-hardened-r1 since 2014-01-03 without issue until now -- all of |
76 |
our servers have. So the timing seems a little coincidental. |
77 |
|
78 |
If it's not hardware (two different machines...), does this look like a |
79 |
kernel bug? Should I upgrade over the weekend and pray? |