1 |
Hi Rich, |
2 |
Some progress, but questions/comments also |
3 |
|
4 |
On Sat, Oct 19, 2013 at 6:18 PM, Rich Freeman <rich0@g.o> wrote: |
5 |
> On Sat, Oct 19, 2013 at 8:25 PM, Mark Knecht <markknecht@×××××.com> wrote: |
6 |
>> OK, it's a good idea just to have a Konsole terminal open. That might |
7 |
>> catch something. |
8 |
> |
9 |
> I'm not sure if panics show up in konsole. With a virtual console the |
10 |
> kernel actually outputs the message. Konsole under X11 is entirely |
11 |
> user-mode and I'm not sure that ANY user-mode code can ever run after |
12 |
> a panic. |
13 |
> |
14 |
> I think a virtual console is a better bet. |
15 |
> |
16 |
|
17 |
I suspect you're right about Konsole sitting on the KDE desktop. I |
18 |
only meant that sometimes that catches a few messages and was hopeful |
19 |
it might do that here but it's certainly not a real solution. |
20 |
|
21 |
That said I'm not clear about the virtual console point. I thought the |
22 |
virtual consoles were Alt-Ctl-F[1,2,3,..] When this even occurs my |
23 |
keyboard isn't working so I don't know how to get there. You must mean |
24 |
something else? |
25 |
|
26 |
|
27 |
>> OK, so I remember years ago debugging something for Ingo Molnar using |
28 |
>> the serial console, but in those days it was a real serial console on |
29 |
>> a real serial port. None of my machine have those ports anymore. There |
30 |
>> must be a more modern version of doing that. I'll go look for info. |
31 |
>> Ethernet? USB? We've recently moved and the only other machine I've |
32 |
>> got here at the apartment is a Gentoo laptop. |
33 |
> |
34 |
> That you'd have to look into. I'm not sure if the kernel can handle a |
35 |
> serial console on a PL2302/etc. It might - it is all kernel-mode I |
36 |
> think. You'd have to attach it to another device running a terminal |
37 |
> emulator, assuming you don't have a vt100/etc lying around. |
38 |
> |
39 |
>> There's a gentoo.wiki.org page here: |
40 |
>> |
41 |
>> http://wiki.gentoo.org/wiki/Kernel_Crash_Dumps |
42 |
>> |
43 |
>> The setup looks reasonably straight forward so I've reconfigured |
44 |
>> 3.10.17 following those instructions. |
45 |
> |
46 |
> Yeah, I forgot - that was actually started based on my blog entry, |
47 |
> actually. It may very well have been improved on since. |
48 |
> |
49 |
|
50 |
To make progress with /etc/local.d/kdump.start it turns out I also |
51 |
needed to enable |
52 |
|
53 |
File Systems -> Pseudo File Systems -> /proc/kcore |
54 |
|
55 |
The Gentoo wiki only talked about vmcore. |
56 |
|
57 |
<SNIP> |
58 |
>> When turned on it has options for Panic (Reboot) for both types. Seems |
59 |
>> like I probably want that all turned on? |
60 |
> |
61 |
> You could try setting it to no and see if you actually can capture any |
62 |
> meaningful logs that way - there is a chance you could recover your |
63 |
> system without rebooting. However, a panic would be the only real |
64 |
> sure way to ensure a dump. |
65 |
> |
66 |
> Oh, and don't forget that there is a magic sysrq that triggers a |
67 |
> panic. Only issue with that is that you'll have to hunt around for |
68 |
> whatever caused the actual hangup because it won't be in the panic |
69 |
> backtrace (that will just lead you to the sysrq code). |
70 |
> |
71 |
|
72 |
At this point I'm a bit beyond my depth. If the hang created by |
73 |
Virtualbox isn't a panic, but my keyboard is completely locked up, |
74 |
then I don't know how I'm going to issue the magic sysrq to get the |
75 |
dump process going. |
76 |
|
77 |
As a test however, with all of this stuff set up, I logged out of KDE, |
78 |
switched to a console, disabled X and tried |
79 |
|
80 |
echo c>/proc/sysrq-trigger |
81 |
|
82 |
I get a error screen and the system reboots. The first time I did it I |
83 |
had a bunch of disk activity - presumably stuff being copied to either |
84 |
kcore or vmcore - and then much later X & KDE came up running a single |
85 |
processor. This seemed like a positive result. |
86 |
|
87 |
I then thought maybe I shouldn't start xdm in my init scripts so I |
88 |
disabled it, rebooted the box, logged in a root and tried again. This |
89 |
time after the error screen and apparent reboot I gave the machine 45 |
90 |
minutes but never got back to a login screen. |
91 |
|
92 |
QUESTION 1: This machine has 24GB DRAM. I've set crashkernel=256M and |
93 |
hoped for the best but don't know if that's a good setting. |
94 |
|
95 |
QUESTION 2: Am I correct that the captured dump output is going to be |
96 |
a file that's roughly 24GB? Maybe this takes hours or something being |
97 |
that it's that big and I'm presumably saving it to a RAID6 which is |
98 |
doing a lot more parity calcs all in single processor mode. Is there a |
99 |
way to estimate how long I'd have to wait to even get to a login |
100 |
prompt? |
101 |
|
102 |
So far I've been unable to save anything from /proc/vmcore. |
103 |
|
104 |
>> As I expected about the logs. If the machine's dead then I don't want |
105 |
>> stuff getting written to disk anyway. kdump sounds like the best |
106 |
>> solution going right now. I'll try and see if I can get it working. |
107 |
> |
108 |
> Yeah - one of these days I'll see if I can get kdump working again. |
109 |
> What it really needs is an initramfs that will automatically capture |
110 |
> the dump and reboot. That's how other distros handle it. The dumps |
111 |
> are pretty big though - the size of your RAM. |
112 |
|
113 |
That would be helpful certainly. |
114 |
|
115 |
Thanks for all the info and support! |
116 |
|
117 |
Cheers, |
118 |
Mark |