Gentoo Archives: gentoo-user

From: "W.Kenworthy" <billk@×××××××××.au>
To: gentoo-user@l.g.o
Subject: Re: [gentoo-user] random, hard lockups
Date: Thu, 07 Jul 2005 23:53:52
Message-Id: 1120780086.31360.58.camel@bunyip
In Reply to: [gentoo-user] random, hard lockups by Matt Garman
1 Check your chost and kernel host type hasnt changed recently. Have had
2 this happen in the past and the system only crashes when it reaches some
3 incompatible code which makes it hard to track down.
4
5 BillK
6
7 On Thu, 2005-07-07 at 14:44 -0500, Matt Garman wrote:
8 > My system has been experiencing random, hard (must physically
9 > reboot) lockups over the last year or so. The lockups are thus far
10 > completely unpredictable, and it always occurs when I'm not at my
11 > computer (during the night, at work, etc). When the computer goes
12 > into this hard lock up state, the monitor is blank (but not in power
13 > save mode); the computer will respond to pings; I cannot ssh into
14 > the computer.
15 >
16 > I just ran 14 hours of memtest86+ and found no errors.
17 >
18 > I also checked the logs---nothing unusual there (I can't even
19 > pinpoint exactly when the lockups occur).
20 >
21 > Even worse, my computer may be fine for weeks or even months (i.e.
22 > completely stable), then suddently start locking up about once a
23 > day.
24 >
25 > Does anyone have any idea what the problem may be? For what it's
26 > worth, I have a very high ERR count in /proc/interrupts:
27 >
28 > # uptime
29 > 08:58:35 up 1:29, 12 users, load average: 1.22, 1.28, 1.20
30 >
31 > # cat /proc/interrupts
32 > CPU0
33 > 0: 5391962 XT-PIC timer
34 > 1: 3486 XT-PIC i8042
35 > 2: 0 XT-PIC cascade
36 > 5: 481356 XT-PIC sym53c8xx, NVidia nForce2, ohci1394
37 > 8: 2 XT-PIC rtc
38 > 9: 0 XT-PIC acpi
39 > 10: 0 XT-PIC ohci_hcd
40 > 11: 534284 XT-PIC sym53c8xx, ohci_hcd, ehci_hcd, eth0, nvidia
41 > 12: 115771 XT-PIC i8042
42 > 14: 473 XT-PIC ide0
43 > 15: 11 XT-PIC ide1
44 > NMI: 0
45 > LOC: 5391944
46 > ERR: 33336
47 > MIS: 0
48 >
49 >
50 > Note that the machine has only been up for 90 minutes and it's
51 > already logged 33k ERRs (though I don't exactly know what that
52 > means, my other to nforce2 boards have a zero ERR count).
53 >
54 > For what it's worth, this computer has the following hardware: Asus
55 > A7N8X Deluxe, AMD Athlon XP 2500 (Barton core), 2x512 MB RAM,
56 > GeForce4 ti4200 AGP 8x video card, LSI Logic SCSI controller,
57 > Fujitsu SCSI Drive, Samsung IDE drive.
58 >
59 > Another idea, I see the following in my dmesg:
60 >
61 >
62 > PCI: Using ACPI for IRQ routing
63 > ** PCI interrupts are no longer routed automatically. If this
64 > ** causes a device to stop working, it is probably because the
65 > ** driver failed to call pci_enable_device(). As a temporary
66 > ** workaround, the "pci=routeirq" argument restores the old
67 > ** behavior. If this argument makes the device work again,
68 > ** please email the output of "lspci" to bjorn.helgaas@××.com
69 > ** so I can fix the driver.
70 >
71 > In my kernel config, I have Processor Type and Features -> Local
72 > APIC support on unicprocessors and IO-APIC support on unicprocessors
73 > both enabled. However, as you can see above, the kernel is still
74 > using XT-PIC. My other two nforce2 boards (with the same kernel
75 > config) use IO-APIC. I'm not sure exactly what all this means, but
76 > it may mean something to somebody. :)
77 >
78 > Thanks for any help or suggestions!
79 > Matt
80 >
81 > p.s. I'd be happy to post my complete dmesg if anyone would like to
82 > see it. --MG
83 >
84 > --
85 > Matt Garman
86 > email at: http://raw-sewage.net/index.php?file=email
87
88 --
89 gentoo-user@g.o mailing list