Gentoo Archives: gentoo-hardened

From: Brian Kroth <bpkroth@××××.edu>
To: gentoo-hardened@l.g.o
Subject: Re: [gentoo-hardened] kernel upgrade problems: bad page state
Date: Thu, 01 Nov 2007 03:48:37
Message-Id: 47294C17.9080901@wisc.edu
In Reply to: Re: [gentoo-hardened] kernel upgrade problems: bad page state by Brian Kroth
1 Brian Kroth wrote:
2 > I have no problems with 2.6.20-r10. I ran it for 4 hours last night and
3 > some weeks before this. 2.6.20-r6 before that, again no problems.
4 > 2.6.22-r8 and 2.6.23 both die as soon as cactid or nagios start running.
5 > I really don't think this is bad ram anymore. I'll see if I can get an
6 > exact test for others to try. Any other kernel debug tweaks I should try?
7 >
8 > Thanks for all your help,
9 > Brian
10
11 I haven't found a way of reproducing this on other machines yet because
12 it takes lots of time to setup cacti. In playing around with cactid
13 though what I've found is that the error happens /nearly/ everytime I
14 specify something like this:
15
16 cactid --verbosity=5 -f 1 -l 100
17
18 but not ever (yet) with this
19
20 cactid --verbosity=5 -f 1 -l 10
21
22 With sec monitoring kern.log for "Bad page state in 'cactid'" and
23 killing cactid when that happens I've noticed that that last line of
24 output from cactid is always something like this:
25
26 10/31/2007 10:22:32 PM - CACTID: Poller[0] Host[42] DEBUG: The POPEN
27 returned the following File Descriptor 5
28
29 The kern.log shows this:
30
31 Oct 31 22:30:09 tux-mc Bad page state in process 'cactid'
32 Oct 31 22:30:09 tux-mc page:c14070c0 flags:0x40000001 mapping:00000000
33 mapcount:0 count:0
34 Oct 31 22:30:09 tux-mc Trying to fix it up, but a reboot is needed
35 Oct 31 22:30:09 tux-mc Backtrace:
36 Oct 31 22:30:09 tux-mc [<c044bf67>] bad_page+0x63/0x92
37 Oct 31 22:30:09 tux-mc [<c044c90c>] free_hot_cold_page+0x7c/0x17f
38 Oct 31 22:30:09 tux-mc [<c0455c24>] do_wp_page+0x223/0x3ed
39 Oct 31 22:30:09 tux-mc [<c0456f24>] __handle_mm_fault+0x2ad/0x305
40 Oct 31 22:30:09 tux-mc [<c0414616>] do_page_fault+0x1da/0x7d5
41 Oct 31 22:30:09 tux-mc [<c041c2d5>] do_fork+0x15d/0x217
42 Oct 31 22:30:09 tux-mc [<c041443c>] do_page_fault+0x0/0x7d5
43 Oct 31 22:30:09 tux-mc [<c06e8db5>] error_code+0x75/0x80
44 Oct 31 22:30:09 tux-mc [<c06e0000>] svc_defer+0xfa/0x139
45 Oct 31 22:30:09 tux-mc =======================
46
47 The version of cactid in portage is slightly old. After updating from
48 0.8.6i-r1 to 0.8.6j the problem seems to happen less frequently, but
49 still happens. With that in mind might this actually be a software
50 problem and not a kernel problem? Shouldn't PAX be preventing userland
51 software from screwing up the page table?
52
53 I can send more kernel output if anyone's interested. Any thoughts on
54 what else I should be doing to test this?
55
56 Thanks,
57 Brian

Attachments

File name MIME type
smime.p7s application/x-pkcs7-signature

Replies

Subject Author
Re: [gentoo-hardened] kernel upgrade problems: bad page state pageexec@××××××××.hu