1 |
Summary: |
2 |
For those who didn't follow up the thread, I was investigating an error |
3 |
message: |
4 |
"Kernel panic - not syncing: Aiee, killing interrupt handler." where |
5 |
the computer comes to a complete freeze, the only thing that works is |
6 |
the power switch. |
7 |
|
8 |
The error appears only under heavy load like compiling. This is a new |
9 |
box Asus A8V, AMD64, 1Gb or RAM (PC3200 DDR400 Kingston RAM) and Sata |
10 |
200Gb |
11 |
I was able to find out that "Aiee" is a hardware error, Intel has a nice |
12 |
article about it: |
13 |
http://resource.intel.com/telecom/support/tnotes/tnbyos/2000/tn062.htm |
14 |
|
15 |
So following this lead I was looking and trying to pin-point hardware |
16 |
error. |
17 |
It took me one week to investigate trying different solutions like: |
18 |
|
19 |
1.) I Run memtest86 first time, got some errors, so I run the same test |
20 |
on individual sticks (I have 2 x 512Mb), the individual sticks passed |
21 |
the test without errors. |
22 |
I exchanged the sticks between two slots and run the memtest86 again |
23 |
overnight. The test completed 17-passes without any error. |
24 |
So I excluded Memory as a culprit. |
25 |
|
26 |
2.) I disabled Network controller on the motherboard and installed |
27 |
another |
28 |
one on PCI bus - this eliminated possible IRQ conflict, the Sata Drive |
29 |
on channel-0 was sharing an IRQ with Network controller. |
30 |
But it didn't help. |
31 |
|
32 |
3.) I removed the heatsink, cleaned it with 99% isopropyl alcohol and |
33 |
applied a |
34 |
thin layer of new heatsink grease. |
35 |
Did not help. But I still wanted to try as per Robert C. suggestion: |
36 |
"some arctic silver compound instead. It's good for a 3-5C. drop from |
37 |
the regular stuff." |
38 |
Anyhow, I opened the box cover, and the temp. of the CPU dropped from |
39 |
about 40C to about 35C / 36C so I decided to follow some other leads |
40 |
first. |
41 |
|
42 |
4.) I removed SATA drive and tried to install Gentoo on |
43 |
standard IDE drive; this would eliminate SCSI problem and/or buggy |
44 |
driver. |
45 |
Did not help, I haven't had a chance to do a complete base installation |
46 |
when I got the same error message: |
47 |
"Kernel panic - not syncing: Aiee, killing interrupt handler." |
48 |
|
49 |
I got a lead from Francesco T. |
50 |
''Sometimes memtest doesn't stress enough the hardware, see: |
51 |
http://people.redhat.com/dledford/memtest.html |
52 |
..." |
53 |
|
54 |
So it made me think again about the memory. I swapped the two sticks |
55 |
with the two sticks from one of my Backup Server PC2100 2x512Mb |
56 |
|
57 |
So I downloaded some linux source kernel but it needs to be modified as |
58 |
the Red Hat memtest.sh is looking for "linux" top-level directory not |
59 |
some "linux-2.6.-something". |
60 |
Instead of modifying the script it is easier to just modify the |
61 |
kernel-source (as per Richard F help): |
62 |
tar -xzvf linux.tar.gz |
63 |
mv linux-* linux |
64 |
tar -czvf linux.tar.gz linux |
65 |
|
66 |
and one more thing, change the first line of the script: |
67 |
#!/bin/bash2 |
68 |
to: |
69 |
#!/bin/bash |
70 |
|
71 |
I run the RedHad memory test on my main server (different box 20-passes |
72 |
standard script setup) and it went just fine. It finished with an empty |
73 |
line "no error" as weg-page suggest: |
74 |
---quote---- |
75 |
How do you know if your memory passed? |
76 |
|
77 |
Very simple. If you run that script from the command line on your |
78 |
computer and it completes without ever spewing a single message onto |
79 |
your screen, then you passed. If you get messages from diff about |
80 |
differences between files or any other anomolies such as that, then you |
81 |
failed. |
82 |
---end quote----- |
83 |
|
84 |
I run some compiling and did not receive any errors or kernel panic I |
85 |
did run the RedHat memory test on the memory stick from my backup server |
86 |
and it finished without spilling a single error message. |
87 |
|
88 |
So, at this point I know the problem is the memory stick |
89 |
I put back the original memory stick, the Sata Drive, and used the on |
90 |
board Network controller. |
91 |
I tried to run the RedHad memtest.sh it freeze with the same kernel |
92 |
panic: |
93 |
"Kernel panic - not syncing: Aiee, killing interrupt handler." |
94 |
|
95 |
It appears that the test only made into fourth-round when it freeze. |
96 |
It did not spill any message into the screen it just freeze with the |
97 |
kernel panic as always. So I wasn't 100% sure that this would qualify |
98 |
as failed memory test: |
99 |
"...f you get messages from diff about differences between files or any |
100 |
other anomolies such as that, then you failed." |
101 |
But I suppose, it would qualify, you be the judge. |
102 |
|
103 |
Anyhow, I replaced the pair of stick with two new once run memtest.sh |
104 |
30-passes it passed without spilling single "error" on the the line, |
105 |
clean finish. |
106 |
I was able to emerge "kde-meta" and it finished without a single hiccup. |
107 |
|
108 |
Thank you ALL for all your suggestions help, it appears another mystery |
109 |
has been solved. |
110 |
So my conclusion: Do not rely on memtest86 |
111 |
|
112 |
-- |
113 |
#Joseph |
114 |
|
115 |
On Sat, 2005-07-23 at 20:23 +0200, Richard Fish wrote: |
116 |
> Joseph wrote: |
117 |
> |
118 |
> >>[...] |
119 |
> >> |
120 |
> >> |
121 |
> >>>-bash: ./memtest.sh: /bin/bash2: bad interpreter: No such file or |
122 |
> >>>directory |
123 |
> >>> |
124 |
> >>>On both boxes the I have bash-3.0 so what is it looking for? |
125 |
> >>> |
126 |
> >>> |
127 |
> >>Correct the first line of the script from "#!/bin/bash2" to |
128 |
> >>"#!/bin/bash" and everything will be fine. |
129 |
> >> |
130 |
> >>Ciao |
131 |
> >> Francesco |
132 |
> >> |
133 |
> >> |
134 |
> > |
135 |
> >Thank you, yes that is what I did as soon as I posted the message. |
136 |
> >Though it puzzle me whey it runs on my main server and not on the new |
137 |
> >box? |
138 |
> > |
139 |
> >Ps. my mean server pass the memtest.sh without any errors, I'm only |
140 |
> >curious the result of that bad rum sticks that pass memtest86 on the |
141 |
> >new box. I will re-run both test and post the results. |
142 |
> > |
143 |
> > |
144 |
> |
145 |
> My guess is still that if you relax the memory timings in the BIOS, the |
146 |
> "bad" RAM will start to work fine. Of course, *I* would still return it |
147 |
> and get RAM that actually performs to the specs on the box, but that's |
148 |
> just me! :-> |
149 |
> |
150 |
> -Richard |
151 |
> |
152 |
> |
153 |
|
154 |
-- |
155 |
gentoo-user@g.o mailing list |