1 |
Maybe a bug somewhere else too, which combination kernel/grsec/pax was used? |
2 |
|
3 |
On 05/09/2014 05:15 PM, Michael Orlitzky wrote: |
4 |
> Last week, the LMTP daemon on our mail server (HP DL360 G6) crashed. |
5 |
> People noticed that the mail stopped coming in, so I SSHed in to check |
6 |
> on it, and there were some weird traces in the dmesg. While trying to |
7 |
> investigate, I noticed some more badness: |
8 |
> |
9 |
> # emerge -1 openntpd |
10 |
> Calculating dependencies... done! |
11 |
> |
12 |
> >>> Verifying ebuild manifests |
13 |
> Killed |
14 |
> |
15 |
> At that point I'm thinking, "hardware problem, there goes the weekend." |
16 |
> Most of my tools are committing suicide so I surrender and reboot. The |
17 |
> thing comes up fine and has been working ever since. |
18 |
> |
19 |
> Today, another one of our web servers (HP DL360 G5?) does the same |
20 |
> thing. The nightly log report was empty, because there's no syslog |
21 |
> daemon running. This morning dmesg shows: |
22 |
> |
23 |
>> [Fri May 9 11:00:42 2014] PAX: refcount overflow detected in: syslog-ng:21823, uid/euid: 0/0 |
24 |
>> [Fri May 9 11:00:42 2014] CPU: 2 PID: 21823 Comm: syslog-ng Not tainted 3.11.7-hardened-r1 #1 |
25 |
>> [Fri May 9 11:00:42 2014] task: ffff8802cffca080 ti: ffff8802cffca488 task.ti: ffff8802cffca488 |
26 |
>> [Fri May 9 11:00:42 2014] RIP: 0010:[<ffffffff810e311e>] [<ffffffff810e311e>] 0xffffffff810e311e |
27 |
>> [Fri May 9 11:00:42 2014] RSP: 0018:ffff880416f21c78 EFLAGS: 00000a96 |
28 |
>> [Fri May 9 11:00:42 2014] RAX: ffff88041f0048a0 RBX: ffff88041a1edf00 RCX: 0000000040276333 |
29 |
>> [Fri May 9 11:00:42 2014] RDX: 0000000040276332 RSI: 0000000000000000 RDI: ffff88041d858720 |
30 |
>> [Fri May 9 11:00:42 2014] RBP: 0000000000000008 R08: 0000000000010bc0 R09: ffff88042fb10bc0 |
31 |
>> [Fri May 9 11:00:42 2014] R10: 8000000000000000 R11: ffffea000fec3040 R12: ffff88041f0048a0 |
32 |
>> [Fri May 9 11:00:42 2014] R13: ffff88026628ef00 R14: ffff88041d858720 R15: ffff88041a1edf10 |
33 |
>> [Fri May 9 11:00:42 2014] FS: 0000000000000000(0000) GS:ffff88042fb00000(0000) knlGS:0000000000000000 |
34 |
>> [Fri May 9 11:00:42 2014] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 |
35 |
>> [Fri May 9 11:00:42 2014] CR2: 0000035fb5abf850 CR3: 000000000138a000 CR4: 00000000000006b0 |
36 |
>> [Fri May 9 11:00:42 2014] Stack: |
37 |
>> [Fri May 9 11:00:42 2014] 0000000000000000 ffffffff818dde60 ffff8804140ac100 ffff8802cffca570 |
38 |
>> [Fri May 9 11:00:42 2014] ffff8802cffca080 ffff880416eb4200 ffff8802cffca080 ffffffff81052750 |
39 |
>> [Fri May 9 11:00:42 2014] 0000000000000000 0000000000000001 ffff88038e6260d8 ffff8802cffca598 |
40 |
>> [Fri May 9 11:00:42 2014] Call Trace: |
41 |
>> [Fri May 9 11:00:42 2014] [<ffffffff81052750>] ? 0xffffffff81052750 |
42 |
>> [Fri May 9 11:00:42 2014] [<ffffffff81036e10>] ? 0xffffffff81036e10 |
43 |
>> [Fri May 9 11:00:42 2014] [<ffffffff810371e8>] ? 0xffffffff810371e8 |
44 |
>> [Fri May 9 11:00:42 2014] [<ffffffff810449cc>] ? 0xffffffff810449cc |
45 |
>> [Fri May 9 11:00:42 2014] [<ffffffff8100241f>] ? 0xffffffff8100241f |
46 |
>> [Fri May 9 11:00:42 2014] [<ffffffff81002a89>] ? 0xffffffff81002a89 |
47 |
>> [Fri May 9 11:00:42 2014] [<ffffffff8137c212>] ? 0xffffffff8137c212 |
48 |
>> [Fri May 9 11:00:42 2014] Code: e9 68 fd 01 00 0f 1f 84 00 00 00 00 00 48 8b 43 18 48 8b 7b 10 48 8b 40 30 f0 ff 88 30 01 00 00 71 09 f0 ff 80 30 01 00 00 cd 04 <0f> b7 00 89 c2 66 81 e2 00 b0 66 81 fa 00 20 0f 84 53 ff ff ff |
49 |
>> [Fri May 9 11:00:42 2014] PAX: refcount overflow detected in: syslog-ng:21823, uid/euid: 0/0 |
50 |
>> [Fri May 9 11:00:42 2014] CPU: 2 PID: 21823 Comm: syslog-ng Not tainted 3.11.7-hardened-r1 #1 |
51 |
>> [Fri May 9 11:00:42 2014] task: ffff8802cffca080 ti: ffff8802cffca488 task.ti: ffff8802cffca488 |
52 |
>> [Fri May 9 11:00:42 2014] RIP: 0010:[<ffffffff810e311e>] [<ffffffff810e311e>] 0xffffffff810e311e |
53 |
>> [Fri May 9 11:00:42 2014] RSP: 0018:ffff880416f21c78 EFLAGS: 00000a96 |
54 |
>> [Fri May 9 11:00:42 2014] RAX: ffff88041f0048a0 RBX: ffff88041a1edc00 RCX: 0000000040c384f8 |
55 |
>> [Fri May 9 11:00:42 2014] RDX: 0000000040c384f7 RSI: 0000000000000000 RDI: ffff88041d858720 |
56 |
>> [Fri May 9 11:00:42 2014] RBP: 0000000000000008 R08: 0000000000010b60 R09: ffff88042fb10b60 |
57 |
>> [Fri May 9 11:00:42 2014] R10: 8000000000000000 R11: ffffea000f26a840 R12: ffff88041f0048a0 |
58 |
>> [Fri May 9 11:00:42 2014] R13: ffff88026628e000 R14: ffff88041d858720 R15: ffff88041a1edc10 |
59 |
>> [Fri May 9 11:00:42 2014] FS: 0000000000000000(0000) GS:ffff88042fb00000(0000) knlGS:0000000000000000 |
60 |
>> [Fri May 9 11:00:42 2014] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 |
61 |
>> [Fri May 9 11:00:42 2014] CR2: 0000035fb5abf850 CR3: 000000000138a000 CR4: 00000000000006b0 |
62 |
>> [Fri May 9 11:00:42 2014] Stack: |
63 |
>> [Fri May 9 11:00:42 2014] 0000000000000000 ffffffff818dde60 ffff88041a1ed400 ffff8802cffca570 |
64 |
>> [Fri May 9 11:00:42 2014] ffff8802cffca080 ffff880416eb4200 ffff8802cffca080 ffffffff81052750 |
65 |
>> [Fri May 9 11:00:42 2014] 0000000000000000 0000000000000001 ffff88038e6260d8 ffff8802cffca598 |
66 |
>> [Fri May 9 11:00:42 2014] Call Trace: |
67 |
>> [Fri May 9 11:00:42 2014] [<ffffffff81052750>] ? 0xffffffff81052750 |
68 |
>> [Fri May 9 11:00:42 2014] [<ffffffff81036e10>] ? 0xffffffff81036e10 |
69 |
>> [Fri May 9 11:00:42 2014] [<ffffffff810371e8>] ? 0xffffffff810371e8 |
70 |
>> [Fri May 9 11:00:42 2014] [<ffffffff810449cc>] ? 0xffffffff810449cc |
71 |
>> [Fri May 9 11:00:42 2014] [<ffffffff8100241f>] ? 0xffffffff8100241f |
72 |
>> [Fri May 9 11:00:42 2014] [<ffffffff81002a89>] ? 0xffffffff81002a89 |
73 |
>> [Fri May 9 11:00:42 2014] [<ffffffff8137c212>] ? 0xffffffff8137c212 |
74 |
>> [Fri May 9 11:00:42 2014] Code: e9 68 fd 01 00 0f 1f 84 00 00 00 00 00 48 8b 43 18 48 8b 7b 10 48 8b 40 30 f0 ff 88 30 01 00 00 71 09 f0 ff 80 30 01 00 00 cd 04 <0f> b7 00 89 c2 66 81 e2 00 b0 66 81 fa 00 20 0f 84 53 ff ff ff |
75 |
> |
76 |
> |
77 |
> And things are segfaulting randomly. These machines have been running |
78 |
> 3.11.7-hardened-r1 since 2014-01-03 without issue until now -- all of |
79 |
> our servers have. So the timing seems a little coincidental. |
80 |
> |
81 |
> If it's not hardware (two different machines...), does this look like a |
82 |
> kernel bug? Should I upgrade over the weekend and pray? |
83 |
> |