1 |
Hi All- |
2 |
|
3 |
A final note to this thread. |
4 |
|
5 |
After trying for many hours of high-intensity cpu activity (like emerging |
6 |
many packages---which is what used to cause the MCE), since replacing my |
7 |
stepping level 7 Xeon with a stepping level 9 Xeon (so that I now have |
8 |
two identical cpus, even in stepping levels (whereas this was not true |
9 |
before), I have been unable to reproduce my MCE 0004 error. I even did |
10 |
this with the kernel compiled with -march=pentium4 CFLAGS (that caused an |
11 |
MCE after 5 or 10 minutes of emerging mysql with stepping level 7 and |
12 |
stepping level 9 cpus installed). |
13 |
|
14 |
Naturally, I'm delighted by this, however, the whole experience has been |
15 |
somewhat confusing (although enlightening in many respects). |
16 |
|
17 |
Memtest86 still behaves exactly as it did before the hardware replacement. |
18 |
Any thoughts on why it behaves this way with this hardware (unable to set |
19 |
address range limits, unable to force ECC testing on, program locks after |
20 |
2 minutes of operation). I suppose that's a question for another thread |
21 |
in another forum. |
22 |
|
23 |
I seem to have suffered from no hardware failures on the M/B, the CPUs |
24 |
(one of the old CPUs is still present---I replaced the other), or the RAM |
25 |
(although I suppose the stepping level 7 Xeon might have had some |
26 |
incredibly subtle flaw that only showed up with another CPU present). |
27 |
|
28 |
The replacement hardware seems to suffer no problems at all, in spite of |
29 |
what Memtest86 does (fails at 1023.8MB 30 or 40 times and then freezes). |
30 |
|
31 |
I really appreciate all of the suggestions here. You guys convinced me |
32 |
that it was hardware which is why I replaced everything and that |
33 |
ultimately solved the problem, although it's not clear that there was |
34 |
really a hardware problem. The lesson I've learned (though I'm not sure |
35 |
this is really the root issue) is that when doing multi-processor |
36 |
computing, make sure that both processors are identical in every way. |
37 |
Any thoughts on the accuracy of this rule? |
38 |
|
39 |
But the bizarre thing is that I couldn't reproduce this MCE at all using |
40 |
another distribution on the same (pre-replacement) hardware. Does Gentoo |
41 |
push the hardware much harder than other distros? Perhaps because I'm |
42 |
compiling the code for my particular hardware vice running code that was |
43 |
built to run on many different sets of hardware (less aggressive CFLAGS |
44 |
et. al.)? I'm at a loss to explain this. |
45 |
|
46 |
Again, many thanks for all the help here. |
47 |
|
48 |
-Kevin |
49 |
|
50 |
-- |
51 |
gentoo-dev@g.o mailing list |