1 |
Thanks again for the replies, folks. |
2 |
|
3 |
Well, I've now replaced the system motherboard, the CPU (first tried |
4 |
removing one CPU and memtest86 behaved the exact same way, then replaced |
5 |
the CPU with the new one), and the RAM. Results: memtest86 and friends |
6 |
all behave the exact same way. Could this still be a hardware problem? |
7 |
I'm hard-pressed to believe that I have two different motherboards that |
8 |
just happen to suffer from the same flaw (they are not even the same |
9 |
exact version: one is version B2 and the other is version C4). The only |
10 |
things that are common between the system now and the system before are: |
11 |
(1) the SCSI controller card (RAID card) (another SCSI controller was |
12 |
replaced with the m/b), (2) 2 SCSI hard drives connected to the RAID |
13 |
card, (3) a PCI hardware controller based modem, and (4) the SCSI |
14 |
hot-plug backplane. Could one of these be causing the problem? I |
15 |
haven't tried reproducing my MCE 0004 error again, but memtest86 shows no |
16 |
difference. Can anyone buy into the notion now that memtest86 is doing |
17 |
something that it shouldn't be doing when testing this system? Again, |
18 |
the Dell Utilities are all turning up flawless. I've set the |
19 |
configuration in memtest86 to limit the address range it tests to those |
20 |
addresses below 1022MB or RAM (this is what the Dell utilities test with |
21 |
1024MB RAM installed), but it ignores those limits and tests up to 1024 |
22 |
anyway and that's where it's still finding its errors (1023.8MB). I've |
23 |
configured memtest86 to turn on ECC testing and it refuses to do so (when |
24 |
I touch (8) for restart tests, the setting returns to off). What's going |
25 |
on here? |
26 |
|
27 |
Any thoughts are most welcome. I'll be trying to reproduce my MCE error |
28 |
with this new hardware, and I'll post results when I have them. |
29 |
|
30 |
Thanks again for all the replies. |
31 |
|
32 |
On Tuesday 18 May 2004 08:02, Josh Glover wrote: |
33 |
> Quoth Kevin (Tue 2004-05-18 04:29:58AM -0400): |
34 |
[...] |
35 |
> > True. Although it is locking up after only 1-2 minutes of operation. |
36 |
> > What conclusion should I draw from that? |
37 |
> |
38 |
> Bad system board. :( |
39 |
|
40 |
I just replaced it. Still does the same thing. |
41 |
|
42 |
> |
43 |
> > Although I'm sure there are others here with more experience |
44 |
> > troubleshooting such problems, I'm thinking that the above is enough |
45 |
> > to base a pretty sound conclusion upon, and the conclusion I would |
46 |
> > draw is that hardware and memory are not the cause of these MCE |
47 |
> > problems. |
48 |
> |
49 |
> Wrong. memtest86 giving you errors almost always indicates a hardware |
50 |
> problem. You have changed the memory, but what remained consistent? The |
51 |
> memory bus! Try a new system board. |
52 |
|
53 |
New system board includes a new memory bus. Still get the same results. |
54 |
|
55 |
> |
56 |
> > I also tried something else that had an enormous positive effect on |
57 |
> > the situation---I changed -march=pentium4 to -march=pentium3 in my |
58 |
> > CFLAGS |
59 |
> |
60 |
> All you have done is turn off SSE2 instructions and possibly a few |
61 |
> others that the P4s have and the P3s do not. If something is wrong with |
62 |
> your system board or CPU, less stress on the CPU is likely not to show |
63 |
> problems as often. |
64 |
|
65 |
That's a good point. I'll try reproducing the MCE now with the new |
66 |
hardware. |
67 |
|
68 |
> |
69 |
> You have bad hardware, Kevin. Try the compile test with one CPU at a |
70 |
> time (i.e. take one out), and if that is not illuminating, replace the |
71 |
> system board. |
72 |
|
73 |
Thanks again gents! |
74 |
|
75 |
-Kevin |
76 |
|
77 |
-- |
78 |
gentoo-dev@g.o mailing list |