1 |
My system has been experiencing random, hard (must physically |
2 |
reboot) lockups over the last year or so. The lockups are thus far |
3 |
completely unpredictable, and it always occurs when I'm not at my |
4 |
computer (during the night, at work, etc). When the computer goes |
5 |
into this hard lock up state, the monitor is blank (but not in power |
6 |
save mode); the computer will respond to pings; I cannot ssh into |
7 |
the computer. |
8 |
|
9 |
I just ran 14 hours of memtest86+ and found no errors. |
10 |
|
11 |
I also checked the logs---nothing unusual there (I can't even |
12 |
pinpoint exactly when the lockups occur). |
13 |
|
14 |
Even worse, my computer may be fine for weeks or even months (i.e. |
15 |
completely stable), then suddently start locking up about once a |
16 |
day. |
17 |
|
18 |
Does anyone have any idea what the problem may be? For what it's |
19 |
worth, I have a very high ERR count in /proc/interrupts: |
20 |
|
21 |
# uptime |
22 |
08:58:35 up 1:29, 12 users, load average: 1.22, 1.28, 1.20 |
23 |
|
24 |
# cat /proc/interrupts |
25 |
CPU0 |
26 |
0: 5391962 XT-PIC timer |
27 |
1: 3486 XT-PIC i8042 |
28 |
2: 0 XT-PIC cascade |
29 |
5: 481356 XT-PIC sym53c8xx, NVidia nForce2, ohci1394 |
30 |
8: 2 XT-PIC rtc |
31 |
9: 0 XT-PIC acpi |
32 |
10: 0 XT-PIC ohci_hcd |
33 |
11: 534284 XT-PIC sym53c8xx, ohci_hcd, ehci_hcd, eth0, nvidia |
34 |
12: 115771 XT-PIC i8042 |
35 |
14: 473 XT-PIC ide0 |
36 |
15: 11 XT-PIC ide1 |
37 |
NMI: 0 |
38 |
LOC: 5391944 |
39 |
ERR: 33336 |
40 |
MIS: 0 |
41 |
|
42 |
|
43 |
Note that the machine has only been up for 90 minutes and it's |
44 |
already logged 33k ERRs (though I don't exactly know what that |
45 |
means, my other to nforce2 boards have a zero ERR count). |
46 |
|
47 |
For what it's worth, this computer has the following hardware: Asus |
48 |
A7N8X Deluxe, AMD Athlon XP 2500 (Barton core), 2x512 MB RAM, |
49 |
GeForce4 ti4200 AGP 8x video card, LSI Logic SCSI controller, |
50 |
Fujitsu SCSI Drive, Samsung IDE drive. |
51 |
|
52 |
Another idea, I see the following in my dmesg: |
53 |
|
54 |
|
55 |
PCI: Using ACPI for IRQ routing |
56 |
** PCI interrupts are no longer routed automatically. If this |
57 |
** causes a device to stop working, it is probably because the |
58 |
** driver failed to call pci_enable_device(). As a temporary |
59 |
** workaround, the "pci=routeirq" argument restores the old |
60 |
** behavior. If this argument makes the device work again, |
61 |
** please email the output of "lspci" to bjorn.helgaas@××.com |
62 |
** so I can fix the driver. |
63 |
|
64 |
In my kernel config, I have Processor Type and Features -> Local |
65 |
APIC support on unicprocessors and IO-APIC support on unicprocessors |
66 |
both enabled. However, as you can see above, the kernel is still |
67 |
using XT-PIC. My other two nforce2 boards (with the same kernel |
68 |
config) use IO-APIC. I'm not sure exactly what all this means, but |
69 |
it may mean something to somebody. :) |
70 |
|
71 |
Thanks for any help or suggestions! |
72 |
Matt |
73 |
|
74 |
p.s. I'd be happy to post my complete dmesg if anyone would like to |
75 |
see it. --MG |
76 |
|
77 |
-- |
78 |
Matt Garman |
79 |
email at: http://raw-sewage.net/index.php?file=email |
80 |
-- |
81 |
gentoo-user@g.o mailing list |