Gentoo Archives: gentoo-amd64

From: Rich Freeman <rich0@g.o>
To: gentoo-amd64@l.g.o
Subject: Re: [gentoo-amd64] Capturing hard hang info?
Date: Sun, 20 Oct 2013 19:03:14
Message-Id: CAGfcS_mNTmHGxTS8CyNhFZCOuk6Ka9GO894NmbsRYzd=46RQgg@mail.gmail.com
In Reply to: Re: [gentoo-amd64] Capturing hard hang info? by Mark Knecht
1 On Sun, Oct 20, 2013 at 1:29 PM, Mark Knecht <markknecht@×××××.com> wrote:
2 > That said I'm not clear about the virtual console point. I thought the
3 > virtual consoles were Alt-Ctl-F[1,2,3,..] When this even occurs my
4 > keyboard isn't working so I don't know how to get there. You must mean
5 > something else?
6
7 It will only be helpful if the console is displayed when the panic
8 occurs. This is helpful when the panics tend to happen when you're
9 away from the system.
10
11 > To make progress with /etc/local.d/kdump.start it turns out I also
12 > needed to enable
13 >
14 > File Systems -> Pseudo File Systems -> /proc/kcore
15 >
16 > The Gentoo wiki only talked about vmcore.
17
18 Feel free to update it, after you're sure you have everything figured out.
19
20 > At this point I'm a bit beyond my depth. If the hang created by
21 > Virtualbox isn't a panic, but my keyboard is completely locked up,
22 > then I don't know how I'm going to issue the magic sysrq to get the
23 > dump process going.
24
25 Are you SURE that it is COMPLETELY locked up? As in alt-sysrq-b
26 doesn't reboot? I've found that this almost always works, even if
27 sysrq otherwise appears to not work (most of the other options won't
28 appear do anything in a panic with the display not on a virtual
29 console while in a graphics mode).
30
31 > I get a error screen and the system reboots. The first time I did it I
32 > had a bunch of disk activity - presumably stuff being copied to either
33 > kcore or vmcore - and then much later X & KDE came up running a single
34 > processor. This seemed like a positive result.
35
36 Nothing gets "copied" to kcore/vmcore. The state of the previous
37 system is already in RAM, or at least it was until KDE launched and
38 overwrote everything. You need to have it boot to a crash kernel and
39 rescue shell for it to be of any use, unless you just want it to
40 auto-reboot to minimize downtime.
41
42 > QUESTION 1: This machine has 24GB DRAM. I've set crashkernel=256M and
43 > hoped for the best but don't know if that's a good setting.
44
45 It is probably fine - it just needs enough space to hold the kernel
46 and initramfs.
47
48 >
49 > QUESTION 2: Am I correct that the captured dump output is going to be
50 > a file that's roughly 24GB? Maybe this takes hours or something being
51 > that it's that big and I'm presumably saving it to a RAID6 which is
52 > doing a lot more parity calcs all in single processor mode. Is there a
53 > way to estimate how long I'd have to wait to even get to a login
54 > prompt?
55
56 You won't get a login prompt unless you have some script set to
57 auto-save the dump file. Typically you'll set things up to just
58 launch a root shell. Saving a 24GB file shouldn't take more than a
59 few minutes with nothing else touching the hard drive.
60
61 Per the wiki you should be running:
62 kexec -p /[path-to-kernel] --append="root=[root-device] single irqpoll
63 maxcpus=1 reset_devices"
64
65 Note the single option in there - you're not going to get a login
66 prompt. It will just dump you at a shell. If you've booted into that
67 kernel then /proc/vmcore will contain a core file from the paniced
68 kernel.
69
70 If you just rebooted normally then I don't think /proc/vmcore will
71 even exist. That was the problem I was having a while back - kexec
72 wasn't actually having any effect. I have no idea if whatever broke
73 it has been fixed.
74
75 Rich