Gentoo Archives: gentoo-hardened

From: Brian Kroth <bpkroth@××××.edu>
To: gentoo-hardened@l.g.o, pageexec@××××××××.hu
Subject: Re: [gentoo-hardened] kernel upgrade problems: bad page state
Date: Sun, 04 Nov 2007 00:06:30
Message-Id: 472D0BF9.3030508@wisc.edu
In Reply to: Re: [gentoo-hardened] kernel upgrade problems: bad page state by pageexec@freemail.hu
1 pageexec@××××××××.hu wrote:
2 > On 31 Oct 2007 at 22:46, Brian Kroth wrote:
3 >
4 >> but not ever (yet) with this
5 >>
6 >> cactid --verbosity=5 -f 1 -l 10
7 >
8 > what does the -l switch do?
9
10 -f , -l allow you to limit the range of hostids to scan. In the
11 examples I gave the first scanned hostids 1-100, the example above scans
12 hostids 1-10. I was doing this originally to see if I could pinpoint
13 one particular host check that was causing it, but it seems to have more
14 to do with large hosts scans. I think that might be because of the
15 number of forks and allocations.
16
17 >
18 >> The version of cactid in portage is slightly old. After updating from
19 >> 0.8.6i-r1 to 0.8.6j the problem seems to happen less frequently, but
20 >> still happens. With that in mind might this actually be a software
21 >> problem and not a kernel problem? Shouldn't PAX be preventing userland
22 >> software from screwing up the page table?
23 >
24 > i'm almost sure it's a bug somewhere in vma mirroring as that's the
25 > only thing i changed in .22 and on and it does play with page locking
26 > (the bad page state is triggered because a to-be-freed page is still
27 > locked, that's means there's a missing unlock somewhere in the code,
28 > but i couldn't figure it out from the code yet).
29
30 Where's the code for this? I'm no kernel guru by any means, but I'd
31 still be interested to look at it and learn.
32
33 >
34 >> I can send more kernel output if anyone's interested. Any thoughts on
35 >> what else I should be doing to test this?
36 >
37 > i'll need your mm/memory.o from the failing kernel and if it occured on
38 > multiple machines or kernels, indicate which of your report corresponds
39 > to which .o (well, i can find it out from the disasm eventually, but it
40 > helps me if i don't have to ;-). then can you send me a /proc/pid/maps file
41 > from cactid and nagios (if you use grsec make sure that addresses are not
42 > hidden and preferably not randomized either)?
43 >
44
45 So far this is only on that single machine, and only for nagios and
46 cacti. I rebuilt the kernel with the config that's attached. I've
47 basically turned on a few more debug settings in the kernel and turned
48 off the randomization features of pax (CONFIG_PAX_ASLR) and the "remove
49 addresses" feature of grsec (CONFIG_GRKERNSEC_PROC_MEMMAP) like you
50 asked. Tweaked my sec script to copy the maps files before killing the
51 offending processes. Everything should be in the tar. Let me know if
52 you need anything else.
53
54 Nov 3 18:27:38 tux-mc IPMI Watchdog: driver initialized
55 Nov 3 18:32:59 tux-mc Bad page state in process 'nagios'
56 Nov 3 18:32:59 tux-mc page:c129d620 flags:0x40000001 mapping:00000000
57 mapcount:0 count:0
58 Nov 3 18:32:59 tux-mc Trying to fix it up, but a reboot is needed
59 Nov 3 18:32:59 tux-mc Backtrace:
60 Nov 3 18:32:59 tux-mc [<c044c150>] bad_page+0x63/0x92
61 Nov 3 18:32:59 tux-mc [<c044cc18>] free_hot_cold_page+0x7c/0x194
62 Nov 3 18:32:59 tux-mc [<c0456110>] do_wp_page+0x22e/0x426
63 Nov 3 18:32:59 tux-mc [<c0457463>] __handle_mm_fault+0x2ad/0x305
64 Nov 3 18:32:59 tux-mc [<c0414576>] do_page_fault+0x1da/0x7d5
65 Nov 3 18:32:59 tux-mc [<c041c269>] do_fork+0x15d/0x217
66 Nov 3 18:32:59 tux-mc [<c041439c>] do_page_fault+0x0/0x7d5
67 Nov 3 18:32:59 tux-mc [<c06e9525>] error_code+0x75/0x80
68 Nov 3 18:32:59 tux-mc [<c06e0000>] svc_setup_socket+0x1aa/0x223
69 Nov 3 18:32:59 tux-mc =======================
70 Nov 3 18:35:07 tux-mc Bad page state in process 'cactid'
71 Nov 3 18:35:07 tux-mc page:c12efdc0 flags:0x40000001 mapping:00000000
72 mapcount:0 count:0
73 Nov 3 18:35:07 tux-mc Trying to fix it up, but a reboot is needed
74 Nov 3 18:35:07 tux-mc Backtrace:
75 Nov 3 18:35:07 tux-mc [<c044c150>] bad_page+0x63/0x92
76 Nov 3 18:35:07 tux-mc [<c044cc18>] free_hot_cold_page+0x7c/0x194
77 Nov 3 18:35:07 tux-mc [<c0456110>] do_wp_page+0x22e/0x426
78 Nov 3 18:35:07 tux-mc [<c0457463>] __handle_mm_fault+0x2ad/0x305
79 Nov 3 18:35:07 tux-mc [<c0414576>] do_page_fault+0x1da/0x7d5
80 Nov 3 18:35:07 tux-mc [<c04682d5>] sys_read+0x68/0x6a
81 Nov 3 18:35:07 tux-mc [<c041439c>] do_page_fault+0x0/0x7d5
82 Nov 3 18:35:07 tux-mc [<c06e9525>] error_code+0x75/0x80
83 Nov 3 18:35:07 tux-mc [<c06e0000>] svc_setup_socket+0x1aa/0x223
84 Nov 3 18:35:07 tux-mc =======================
85
86
87 PS - would you like me to take this off list?
88
89 Thanks again,
90 Brian

Attachments

File name MIME type
2.6.22-hardened-r8_debug-info.tar.bz2 application/x-bzip
smime.p7s application/x-pkcs7-signature

Replies

Subject Author
Re: [gentoo-hardened] kernel upgrade problems: bad page state pageexec@××××××××.hu