1 |
For those who do not read the sparclinux list. This is very nice, and |
2 |
many thanks to David Miller for providing it. |
3 |
|
4 |
Begin forwarded message: |
5 |
|
6 |
Date: Thu, 29 Jan 2009 15:54:12 -0800 (PST) |
7 |
From: David Miller <davem@×××××××××.net> |
8 |
To: sparclinux@×××××××××××.org |
9 |
Subject: NMI watchdog... |
10 |
|
11 |
|
12 |
|
13 |
I just wanted to let folks know what I've been working on, sparc wise. |
14 |
|
15 |
I have this reocurring issue where one of my workstations hangs |
16 |
completely, no keyboard input, no console messages, nothing. |
17 |
|
18 |
Since we have pseudo-NMI support in oprofile via performance counters |
19 |
in the current tree I worked on rearchitecting this so that a nice NMI |
20 |
watchdog layer could be added. |
21 |
|
22 |
It is modelled after the x86 NMI watchdog, with the major difference |
23 |
being that it is enabled by default. The cost is one interrupt per |
24 |
second, and the payback is enormous wrt. the ability to debug complete |
25 |
system hangs. |
26 |
|
27 |
Basically how it works is if we see no timer interrupts processed for |
28 |
5 seconds we print a message, dump registers, and optionally panic the |
29 |
system. |
30 |
|
31 |
This will be supported on any system that has profiling counter |
32 |
overflow interrupt support. That essentially means any cpu from |
33 |
UltraSPARC-III onward (including Niagara chips). |
34 |
|
35 |
Another nice side effect of this work is that it gives us some of the |
36 |
framework necessary for whatever generic performance counter layer |
37 |
gets merged into the tree in the future (Ingo Molnar's work, perfmon3, |
38 |
whatever). |
39 |
|
40 |
I noticed while doing these changes that we need some work in the |
41 |
handling of OOPSes and other errors. In particular we need to start |
42 |
using the existing generic infrastructure the kernel provides, such as |
43 |
oops_enter(), oops_exit(), bust_spinlocks(), etc. I do intend to work |
44 |
on this. |
45 |
|
46 |
I'm currently busy doing testing to make sure that the NMI watchdog |
47 |
and oprofile work as expected. |
48 |
|
49 |
I'll post the patches when I check them in. I intend to push this |
50 |
into the current stable tree because there are entire classes of bugs |
51 |
people run into which can't be analyzed at all without this kind of |
52 |
facility. |
53 |
-- |
54 |
To unsubscribe from this list: send the line "unsubscribe sparclinux" in |
55 |
the body of a message to majordomo@×××××××××××.org |
56 |
More majordomo info at http://vger.kernel.org/majordomo-info.html |
57 |
|
58 |
====================================================== |
59 |
|
60 |
Regards, |
61 |
Ferris |
62 |
-- |
63 |
Ferris McCormick (P44646, MI) <fmccor@g.o> |
64 |
Developer, Gentoo Linux (Sparc, Userrel, Trustees) |