Gentoo Archives: gentoo-sparc

From: Ferris McCormick <fmccor@g.o>
To: Gentoo Sparc <gentoo-sparc@l.g.o>
Subject: [gentoo-sparc] Fw: NMI watchdog...
Date: Fri, 30 Jan 2009 03:06:53
Message-Id: 20090130011816.7ba3a15f@anaconda.krait.us
1 This didn't go anywhere first time. If you end up with two copies,
2 apologies.
3 For those who do not read the sparclinux list. This is very nice, and
4 many thanks to David Miller for providing it.
5
6
7 Begin forwarded message:
8
9 Date: Thu, 29 Jan 2009 15:54:12 -0800 (PST)
10 From: David Miller <davem@×××××××××.net>
11 To: sparclinux@×××××××××××.org
12 Subject: NMI watchdog...
13
14
15
16 I just wanted to let folks know what I've been working on, sparc wise.
17
18 I have this reocurring issue where one of my workstations hangs
19 completely, no keyboard input, no console messages, nothing.
20
21 Since we have pseudo-NMI support in oprofile via performance counters
22 in the current tree I worked on rearchitecting this so that a nice NMI
23 watchdog layer could be added.
24
25 It is modelled after the x86 NMI watchdog, with the major difference
26 being that it is enabled by default. The cost is one interrupt per
27 second, and the payback is enormous wrt. the ability to debug complete
28 system hangs.
29
30 Basically how it works is if we see no timer interrupts processed for
31 5 seconds we print a message, dump registers, and optionally panic the
32 system.
33
34 This will be supported on any system that has profiling counter
35 overflow interrupt support. That essentially means any cpu from
36 UltraSPARC-III onward (including Niagara chips).
37
38 Another nice side effect of this work is that it gives us some of the
39 framework necessary for whatever generic performance counter layer
40 gets merged into the tree in the future (Ingo Molnar's work, perfmon3,
41 whatever).
42
43 I noticed while doing these changes that we need some work in the
44 handling of OOPSes and other errors. In particular we need to start
45 using the existing generic infrastructure the kernel provides, such as
46 oops_enter(), oops_exit(), bust_spinlocks(), etc. I do intend to work
47 on this.
48
49 I'm currently busy doing testing to make sure that the NMI watchdog
50 and oprofile work as expected.
51
52 I'll post the patches when I check them in. I intend to push this
53 into the current stable tree because there are entire classes of bugs
54 people run into which can't be analyzed at all without this kind of
55 facility.
56 --
57 To unsubscribe from this list: send the line "unsubscribe sparclinux" in
58 the body of a message to majordomo@×××××××××××.org
59 More majordomo info at http://vger.kernel.org/majordomo-info.html
60
61
62 --
63 Ferris McCormick (P44646, MI) <fmccor@g.o>
64 Developer, Gentoo Linux (Sparc, Userrel, Trustees)

Attachments

File name MIME type
signature.asc application/pgp-signature