Gentoo Archives: gentoo-user

From: Joseph <syscon@×××××××××.com>
To: gentoo-user@l.g.o
Subject: Re: [gentoo-user] Re: [SOLVED - CONCLUSION] - was 1.) Kernel panic - not syncing: Aiee, killing interupt handler
Date: Sun, 24 Jul 2005 19:33:19
Message-Id: 1122233262.17006.64.camel@sysconcept.ca
In Reply to: Re: [gentoo-user] Re: [New Development] - was 1.) Kernel panic - not syncing: Aiee, killing interupt handler by Richard Fish
1 Summary:
2 For those who didn't follow up the thread, I was investigating an error
3 message:
4 "Kernel panic - not syncing: Aiee, killing interrupt handler." where
5 the computer comes to a complete freeze, the only thing that works is
6 the power switch.
7
8 The error appears only under heavy load like compiling. This is a new
9 box Asus A8V, AMD64, 1Gb or RAM (PC3200 DDR400 Kingston RAM) and Sata
10 200Gb
11 I was able to find out that "Aiee" is a hardware error, Intel has a nice
12 article about it:
13 http://resource.intel.com/telecom/support/tnotes/tnbyos/2000/tn062.htm
14
15 So following this lead I was looking and trying to pin-point hardware
16 error.
17 It took me one week to investigate trying different solutions like:
18
19 1.) I Run memtest86 first time, got some errors, so I run the same test
20 on individual sticks (I have 2 x 512Mb), the individual sticks passed
21 the test without errors.
22 I exchanged the sticks between two slots and run the memtest86 again
23 overnight. The test completed 17-passes without any error.
24 So I excluded Memory as a culprit.
25
26 2.) I disabled Network controller on the motherboard and installed
27 another
28 one on PCI bus - this eliminated possible IRQ conflict, the Sata Drive
29 on channel-0 was sharing an IRQ with Network controller.
30 But it didn't help.
31
32 3.) I removed the heatsink, cleaned it with 99% isopropyl alcohol and
33 applied a
34 thin layer of new heatsink grease.
35 Did not help. But I still wanted to try as per Robert C. suggestion:
36 "some arctic silver compound instead. It's good for a 3-5C. drop from
37 the regular stuff."
38 Anyhow, I opened the box cover, and the temp. of the CPU dropped from
39 about 40C to about 35C / 36C so I decided to follow some other leads
40 first.
41
42 4.) I removed SATA drive and tried to install Gentoo on
43 standard IDE drive; this would eliminate SCSI problem and/or buggy
44 driver.
45 Did not help, I haven't had a chance to do a complete base installation
46 when I got the same error message:
47 "Kernel panic - not syncing: Aiee, killing interrupt handler."
48
49 I got a lead from Francesco T.
50 ''Sometimes memtest doesn't stress enough the hardware, see:
51 http://people.redhat.com/dledford/memtest.html
52 ..."
53
54 So it made me think again about the memory. I swapped the two sticks
55 with the two sticks from one of my Backup Server PC2100 2x512Mb
56
57 So I downloaded some linux source kernel but it needs to be modified as
58 the Red Hat memtest.sh is looking for "linux" top-level directory not
59 some "linux-2.6.-something".
60 Instead of modifying the script it is easier to just modify the
61 kernel-source (as per Richard F help):
62 tar -xzvf linux.tar.gz
63 mv linux-* linux
64 tar -czvf linux.tar.gz linux
65
66 and one more thing, change the first line of the script:
67 #!/bin/bash2
68 to:
69 #!/bin/bash
70
71 I run the RedHad memory test on my main server (different box 20-passes
72 standard script setup) and it went just fine. It finished with an empty
73 line "no error" as weg-page suggest:
74 ---quote----
75 How do you know if your memory passed?
76
77 Very simple. If you run that script from the command line on your
78 computer and it completes without ever spewing a single message onto
79 your screen, then you passed. If you get messages from diff about
80 differences between files or any other anomolies such as that, then you
81 failed.
82 ---end quote-----
83
84 I run some compiling and did not receive any errors or kernel panic I
85 did run the RedHat memory test on the memory stick from my backup server
86 and it finished without spilling a single error message.
87
88 So, at this point I know the problem is the memory stick
89 I put back the original memory stick, the Sata Drive, and used the on
90 board Network controller.
91 I tried to run the RedHad memtest.sh it freeze with the same kernel
92 panic:
93 "Kernel panic - not syncing: Aiee, killing interrupt handler."
94
95 It appears that the test only made into fourth-round when it freeze.
96 It did not spill any message into the screen it just freeze with the
97 kernel panic as always. So I wasn't 100% sure that this would qualify
98 as failed memory test:
99 "...f you get messages from diff about differences between files or any
100 other anomolies such as that, then you failed."
101 But I suppose, it would qualify, you be the judge.
102
103 Anyhow, I replaced the pair of stick with two new once run memtest.sh
104 30-passes it passed without spilling single "error" on the the line,
105 clean finish.
106 I was able to emerge "kde-meta" and it finished without a single hiccup.
107
108 Thank you ALL for all your suggestions help, it appears another mystery
109 has been solved.
110 So my conclusion: Do not rely on memtest86
111
112 --
113 #Joseph
114
115 On Sat, 2005-07-23 at 20:23 +0200, Richard Fish wrote:
116 > Joseph wrote:
117 >
118 > >>[...]
119 > >>
120 > >>
121 > >>>-bash: ./memtest.sh: /bin/bash2: bad interpreter: No such file or
122 > >>>directory
123 > >>>
124 > >>>On both boxes the I have bash-3.0 so what is it looking for?
125 > >>>
126 > >>>
127 > >>Correct the first line of the script from "#!/bin/bash2" to
128 > >>"#!/bin/bash" and everything will be fine.
129 > >>
130 > >>Ciao
131 > >> Francesco
132 > >>
133 > >>
134 > >
135 > >Thank you, yes that is what I did as soon as I posted the message.
136 > >Though it puzzle me whey it runs on my main server and not on the new
137 > >box?
138 > >
139 > >Ps. my mean server pass the memtest.sh without any errors, I'm only
140 > >curious the result of that bad rum sticks that pass memtest86 on the
141 > >new box. I will re-run both test and post the results.
142 > >
143 > >
144 >
145 > My guess is still that if you relax the memory timings in the BIOS, the
146 > "bad" RAM will start to work fine. Of course, *I* would still return it
147 > and get RAM that actually performs to the specs on the box, but that's
148 > just me! :->
149 >
150 > -Richard
151 >
152 >
153
154 --
155 gentoo-user@g.o mailing list