1 |
On Monday 28 Apr 2014 13:32:05 I wrote: |
2 |
> On Thursday 24 Apr 2014 13:57:19 I wrote: |
3 |
> > So far I've done these things: |
4 |
> > |
5 |
> > 1. Wiped the whole system and restored from backup (heavy overkill, but I |
6 |
> > wanted everything to be in the same, consistent state). |
7 |
> > 2. Run bad-blocks tests on all partitions (though all but / and /boot are |
8 |
> > in logical volumes - I don't know to what extent that will have affected |
9 |
> > the results). |
10 |
> |
11 |
> --->8 |
12 |
> |
13 |
> Looking at bad-blocks again, I see from gkrellm that 'mkfs.ext4 -cc -L Atom |
14 |
> /dev/vg7/atom' writes the test patterns to both the underlying physical |
15 |
> disks, but it only reads back from one of them |
16 |
|
17 |
... so it isn't much use on a virtual disk. |
18 |
|
19 |
Well, that was a long weekend. |
20 |
|
21 |
The symptoms grew stranger and stranger, until I eventually discovered a |
22 |
problem with IRQ 16. |
23 |
|
24 |
/proc/interrupts includes this line: |
25 |
16: 0 302525 0 0 IO-APIC-fasteoi ehci_hcd:usb1, nouveau |
26 |
|
27 |
The source file /usr/src/linux/kernel/irq/spurious.c says: |
28 |
|
29 |
/* |
30 |
* If 99,900 of the previous 100,000 interrupts have not been handled |
31 |
* then assume that the IRQ is stuck in some manner. Drop a diagnostic |
32 |
* and try to turn the IRQ off. |
33 |
* |
34 |
* (The other 100-of-100,000 interrupts may have been a correctly |
35 |
* functioning device sharing an IRQ with the failing one) |
36 |
*/ |
37 |
|
38 |
...and suggests booting with irqpoll. |
39 |
|
40 |
So I added irqpoll to the kernel command line. It seemed to make no difference |
41 |
at the time, but I haven't had any recurrence in the last two days. I see |
42 |
though that, according to gkrellm, I have core temps of 52 - 56C and the |
43 |
graphics card shows 59C. That shouldn't be hot enough to start raising |
44 |
spurious interrupts: the nVidia web site says to expect around 105C as a |
45 |
limit. Perhaps I should find a different slot for the Quadro FX580 card, to |
46 |
separate it from the usb interface. |
47 |
|
48 |
So, many hours and much rebuilding later, I've installed a new chroot for the |
49 |
Atom and it seems to be working as expected. Actually, I reinstalled the |
50 |
entire system to be safe, including re-creating the physical and logical |
51 |
volumes on the two SATA disks. |
52 |
|
53 |
The question still remaining is what caused millions of spurious interrupts |
54 |
over a period of a week or so and then subsided. This is an Asus P7P55D |
55 |
motherboard (http://www.asus.com/Motherboards/P7P55D/). |
56 |
|
57 |
-- |
58 |
Regards |
59 |
Peter |