1 |
Check your chost and kernel host type hasnt changed recently. Have had |
2 |
this happen in the past and the system only crashes when it reaches some |
3 |
incompatible code which makes it hard to track down. |
4 |
|
5 |
BillK |
6 |
|
7 |
On Thu, 2005-07-07 at 14:44 -0500, Matt Garman wrote: |
8 |
> My system has been experiencing random, hard (must physically |
9 |
> reboot) lockups over the last year or so. The lockups are thus far |
10 |
> completely unpredictable, and it always occurs when I'm not at my |
11 |
> computer (during the night, at work, etc). When the computer goes |
12 |
> into this hard lock up state, the monitor is blank (but not in power |
13 |
> save mode); the computer will respond to pings; I cannot ssh into |
14 |
> the computer. |
15 |
> |
16 |
> I just ran 14 hours of memtest86+ and found no errors. |
17 |
> |
18 |
> I also checked the logs---nothing unusual there (I can't even |
19 |
> pinpoint exactly when the lockups occur). |
20 |
> |
21 |
> Even worse, my computer may be fine for weeks or even months (i.e. |
22 |
> completely stable), then suddently start locking up about once a |
23 |
> day. |
24 |
> |
25 |
> Does anyone have any idea what the problem may be? For what it's |
26 |
> worth, I have a very high ERR count in /proc/interrupts: |
27 |
> |
28 |
> # uptime |
29 |
> 08:58:35 up 1:29, 12 users, load average: 1.22, 1.28, 1.20 |
30 |
> |
31 |
> # cat /proc/interrupts |
32 |
> CPU0 |
33 |
> 0: 5391962 XT-PIC timer |
34 |
> 1: 3486 XT-PIC i8042 |
35 |
> 2: 0 XT-PIC cascade |
36 |
> 5: 481356 XT-PIC sym53c8xx, NVidia nForce2, ohci1394 |
37 |
> 8: 2 XT-PIC rtc |
38 |
> 9: 0 XT-PIC acpi |
39 |
> 10: 0 XT-PIC ohci_hcd |
40 |
> 11: 534284 XT-PIC sym53c8xx, ohci_hcd, ehci_hcd, eth0, nvidia |
41 |
> 12: 115771 XT-PIC i8042 |
42 |
> 14: 473 XT-PIC ide0 |
43 |
> 15: 11 XT-PIC ide1 |
44 |
> NMI: 0 |
45 |
> LOC: 5391944 |
46 |
> ERR: 33336 |
47 |
> MIS: 0 |
48 |
> |
49 |
> |
50 |
> Note that the machine has only been up for 90 minutes and it's |
51 |
> already logged 33k ERRs (though I don't exactly know what that |
52 |
> means, my other to nforce2 boards have a zero ERR count). |
53 |
> |
54 |
> For what it's worth, this computer has the following hardware: Asus |
55 |
> A7N8X Deluxe, AMD Athlon XP 2500 (Barton core), 2x512 MB RAM, |
56 |
> GeForce4 ti4200 AGP 8x video card, LSI Logic SCSI controller, |
57 |
> Fujitsu SCSI Drive, Samsung IDE drive. |
58 |
> |
59 |
> Another idea, I see the following in my dmesg: |
60 |
> |
61 |
> |
62 |
> PCI: Using ACPI for IRQ routing |
63 |
> ** PCI interrupts are no longer routed automatically. If this |
64 |
> ** causes a device to stop working, it is probably because the |
65 |
> ** driver failed to call pci_enable_device(). As a temporary |
66 |
> ** workaround, the "pci=routeirq" argument restores the old |
67 |
> ** behavior. If this argument makes the device work again, |
68 |
> ** please email the output of "lspci" to bjorn.helgaas@××.com |
69 |
> ** so I can fix the driver. |
70 |
> |
71 |
> In my kernel config, I have Processor Type and Features -> Local |
72 |
> APIC support on unicprocessors and IO-APIC support on unicprocessors |
73 |
> both enabled. However, as you can see above, the kernel is still |
74 |
> using XT-PIC. My other two nforce2 boards (with the same kernel |
75 |
> config) use IO-APIC. I'm not sure exactly what all this means, but |
76 |
> it may mean something to somebody. :) |
77 |
> |
78 |
> Thanks for any help or suggestions! |
79 |
> Matt |
80 |
> |
81 |
> p.s. I'd be happy to post my complete dmesg if anyone would like to |
82 |
> see it. --MG |
83 |
> |
84 |
> -- |
85 |
> Matt Garman |
86 |
> email at: http://raw-sewage.net/index.php?file=email |
87 |
|
88 |
-- |
89 |
gentoo-user@g.o mailing list |