Gentoo Archives: gentoo-amd64

From: Mark Knecht <markknecht@×××××.com>
To: Gentoo AMD64 <gentoo-amd64@l.g.o>
Subject: Re: [gentoo-amd64] Capturing hard hang info?
Date: Sun, 20 Oct 2013 17:29:18
Message-Id: CAK2H+edgMcMRKAsxtmpoN5bxWpV8s444KwDkw-wG-j_dT8OjgQ@mail.gmail.com
In Reply to: Re: [gentoo-amd64] Capturing hard hang info? by Rich Freeman
1 Hi Rich,
2 Some progress, but questions/comments also
3
4 On Sat, Oct 19, 2013 at 6:18 PM, Rich Freeman <rich0@g.o> wrote:
5 > On Sat, Oct 19, 2013 at 8:25 PM, Mark Knecht <markknecht@×××××.com> wrote:
6 >> OK, it's a good idea just to have a Konsole terminal open. That might
7 >> catch something.
8 >
9 > I'm not sure if panics show up in konsole. With a virtual console the
10 > kernel actually outputs the message. Konsole under X11 is entirely
11 > user-mode and I'm not sure that ANY user-mode code can ever run after
12 > a panic.
13 >
14 > I think a virtual console is a better bet.
15 >
16
17 I suspect you're right about Konsole sitting on the KDE desktop. I
18 only meant that sometimes that catches a few messages and was hopeful
19 it might do that here but it's certainly not a real solution.
20
21 That said I'm not clear about the virtual console point. I thought the
22 virtual consoles were Alt-Ctl-F[1,2,3,..] When this even occurs my
23 keyboard isn't working so I don't know how to get there. You must mean
24 something else?
25
26
27 >> OK, so I remember years ago debugging something for Ingo Molnar using
28 >> the serial console, but in those days it was a real serial console on
29 >> a real serial port. None of my machine have those ports anymore. There
30 >> must be a more modern version of doing that. I'll go look for info.
31 >> Ethernet? USB? We've recently moved and the only other machine I've
32 >> got here at the apartment is a Gentoo laptop.
33 >
34 > That you'd have to look into. I'm not sure if the kernel can handle a
35 > serial console on a PL2302/etc. It might - it is all kernel-mode I
36 > think. You'd have to attach it to another device running a terminal
37 > emulator, assuming you don't have a vt100/etc lying around.
38 >
39 >> There's a gentoo.wiki.org page here:
40 >>
41 >> http://wiki.gentoo.org/wiki/Kernel_Crash_Dumps
42 >>
43 >> The setup looks reasonably straight forward so I've reconfigured
44 >> 3.10.17 following those instructions.
45 >
46 > Yeah, I forgot - that was actually started based on my blog entry,
47 > actually. It may very well have been improved on since.
48 >
49
50 To make progress with /etc/local.d/kdump.start it turns out I also
51 needed to enable
52
53 File Systems -> Pseudo File Systems -> /proc/kcore
54
55 The Gentoo wiki only talked about vmcore.
56
57 <SNIP>
58 >> When turned on it has options for Panic (Reboot) for both types. Seems
59 >> like I probably want that all turned on?
60 >
61 > You could try setting it to no and see if you actually can capture any
62 > meaningful logs that way - there is a chance you could recover your
63 > system without rebooting. However, a panic would be the only real
64 > sure way to ensure a dump.
65 >
66 > Oh, and don't forget that there is a magic sysrq that triggers a
67 > panic. Only issue with that is that you'll have to hunt around for
68 > whatever caused the actual hangup because it won't be in the panic
69 > backtrace (that will just lead you to the sysrq code).
70 >
71
72 At this point I'm a bit beyond my depth. If the hang created by
73 Virtualbox isn't a panic, but my keyboard is completely locked up,
74 then I don't know how I'm going to issue the magic sysrq to get the
75 dump process going.
76
77 As a test however, with all of this stuff set up, I logged out of KDE,
78 switched to a console, disabled X and tried
79
80 echo c>/proc/sysrq-trigger
81
82 I get a error screen and the system reboots. The first time I did it I
83 had a bunch of disk activity - presumably stuff being copied to either
84 kcore or vmcore - and then much later X & KDE came up running a single
85 processor. This seemed like a positive result.
86
87 I then thought maybe I shouldn't start xdm in my init scripts so I
88 disabled it, rebooted the box, logged in a root and tried again. This
89 time after the error screen and apparent reboot I gave the machine 45
90 minutes but never got back to a login screen.
91
92 QUESTION 1: This machine has 24GB DRAM. I've set crashkernel=256M and
93 hoped for the best but don't know if that's a good setting.
94
95 QUESTION 2: Am I correct that the captured dump output is going to be
96 a file that's roughly 24GB? Maybe this takes hours or something being
97 that it's that big and I'm presumably saving it to a RAID6 which is
98 doing a lot more parity calcs all in single processor mode. Is there a
99 way to estimate how long I'd have to wait to even get to a login
100 prompt?
101
102 So far I've been unable to save anything from /proc/vmcore.
103
104 >> As I expected about the logs. If the machine's dead then I don't want
105 >> stuff getting written to disk anyway. kdump sounds like the best
106 >> solution going right now. I'll try and see if I can get it working.
107 >
108 > Yeah - one of these days I'll see if I can get kdump working again.
109 > What it really needs is an initramfs that will automatically capture
110 > the dump and reboot. That's how other distros handle it. The dumps
111 > are pretty big though - the size of your RAM.
112
113 That would be helpful certainly.
114
115 Thanks for all the info and support!
116
117 Cheers,
118 Mark

Replies

Subject Author
Re: [gentoo-amd64] Capturing hard hang info? Rich Freeman <rich0@g.o>