1 |
This didn't go anywhere first time. If you end up with two copies, |
2 |
apologies. |
3 |
For those who do not read the sparclinux list. This is very nice, and |
4 |
many thanks to David Miller for providing it. |
5 |
|
6 |
|
7 |
Begin forwarded message: |
8 |
|
9 |
Date: Thu, 29 Jan 2009 15:54:12 -0800 (PST) |
10 |
From: David Miller <davem@×××××××××.net> |
11 |
To: sparclinux@×××××××××××.org |
12 |
Subject: NMI watchdog... |
13 |
|
14 |
|
15 |
|
16 |
I just wanted to let folks know what I've been working on, sparc wise. |
17 |
|
18 |
I have this reocurring issue where one of my workstations hangs |
19 |
completely, no keyboard input, no console messages, nothing. |
20 |
|
21 |
Since we have pseudo-NMI support in oprofile via performance counters |
22 |
in the current tree I worked on rearchitecting this so that a nice NMI |
23 |
watchdog layer could be added. |
24 |
|
25 |
It is modelled after the x86 NMI watchdog, with the major difference |
26 |
being that it is enabled by default. The cost is one interrupt per |
27 |
second, and the payback is enormous wrt. the ability to debug complete |
28 |
system hangs. |
29 |
|
30 |
Basically how it works is if we see no timer interrupts processed for |
31 |
5 seconds we print a message, dump registers, and optionally panic the |
32 |
system. |
33 |
|
34 |
This will be supported on any system that has profiling counter |
35 |
overflow interrupt support. That essentially means any cpu from |
36 |
UltraSPARC-III onward (including Niagara chips). |
37 |
|
38 |
Another nice side effect of this work is that it gives us some of the |
39 |
framework necessary for whatever generic performance counter layer |
40 |
gets merged into the tree in the future (Ingo Molnar's work, perfmon3, |
41 |
whatever). |
42 |
|
43 |
I noticed while doing these changes that we need some work in the |
44 |
handling of OOPSes and other errors. In particular we need to start |
45 |
using the existing generic infrastructure the kernel provides, such as |
46 |
oops_enter(), oops_exit(), bust_spinlocks(), etc. I do intend to work |
47 |
on this. |
48 |
|
49 |
I'm currently busy doing testing to make sure that the NMI watchdog |
50 |
and oprofile work as expected. |
51 |
|
52 |
I'll post the patches when I check them in. I intend to push this |
53 |
into the current stable tree because there are entire classes of bugs |
54 |
people run into which can't be analyzed at all without this kind of |
55 |
facility. |
56 |
-- |
57 |
To unsubscribe from this list: send the line "unsubscribe sparclinux" in |
58 |
the body of a message to majordomo@×××××××××××.org |
59 |
More majordomo info at http://vger.kernel.org/majordomo-info.html |
60 |
|
61 |
|
62 |
-- |
63 |
Ferris McCormick (P44646, MI) <fmccor@g.o> |
64 |
Developer, Gentoo Linux (Sparc, Userrel, Trustees) |