1 |
Deedra Waters posted <Pine.LNX.4.64.0512062202210.6176@monster>, excerpted |
2 |
below, on Tue, 06 Dec 2005 22:04:50 -0600: |
3 |
|
4 |
> Is there a way to test that fact? I've tried to work with lm_sensors, |
5 |
> but the readings for that are way way off. So, considering lm_sensors |
6 |
> isuseless is there another way to tell if overheating is the problem? |
7 |
> |
8 |
> The case itself has a lot of fans, but it's also got 5 harddrives in it. |
9 |
|
10 |
Don't know about ASUS, but Tyan has lm_sensors config files for many of |
11 |
their boards on their site. |
12 |
|
13 |
I had similar but not as severe (main memory only, no L2 cache errors) |
14 |
here. For quite some time they drove me nearly up a wall, so I can |
15 |
definitely identify with your situation! |
16 |
|
17 |
In my case, it turned out to be over-rated generic memory. After a BIOS |
18 |
update added memory timing control, I limited my so-called PC3200 memory |
19 |
to PC3000 (downclocking from 200/400 MHz to 183/366 MHz), and now get to |
20 |
actually enjoy that fabled Linux stability, with the only reboots being |
21 |
when I do so purposefully! =8^) |
22 |
|
23 |
One thing you can try, somewhat counter-intuitive, but it definitely |
24 |
helped here until BIOS got timing limit functionality (it didn't seem to |
25 |
cause any compile problems or the like, either, a good thing on Gentoo), |
26 |
is to turn OFF ECC. The best I can figure, the additional ECC data put a |
27 |
higher strain on already touchy timings, so turning it OFF increased |
28 |
stability while not noticably increasing undetected errors. |
29 |
|
30 |
In any case, try declocking a bit. I only declocked memory, but if it's |
31 |
really L2 cache issues for you, you'll likely have to declock the CPUs as |
32 |
well. If it's overheating or general timing touchiness, that should |
33 |
definitely improve stability. |
34 |
|
35 |
It could also be slightly low voltage. Again, a properly configured |
36 |
lm_sensors config would be a /great/ help here, but if it isn't |
37 |
available... Turning the clocking down should help there as well, but |
38 |
turning the voltage up at the regular clock rate, provided your cooling is |
39 |
fine, may also help. If it's the cooling and NOT the voltage, that will |
40 |
make things WORSE, of course, thereby giving you a way to tell the |
41 |
difference, PROVIDED you want to risk 0v3r(10(kin9 methods even if not |
42 |
actually overclocking, of course. =8^) |
43 |
|
44 |
That leads to another possible solution, one some will certainly consider |
45 |
more sane than resorting to upping voltages. Particularly with that many |
46 |
drives and the number of fans you indicate you may have, plus everything |
47 |
else in a normal computer, it could be your power supply isn't quite large |
48 |
enough to handle the demand. This could easily be rehash for you, but |
49 |
just in case... many power supplies are hopelessly overrated, |
50 |
particularly if you don't see any UL or CE (or other |
51 |
appropriate nationality testing organization) certifications on them. If |
52 |
they are certified, you can safely assume the rated overall output, but |
53 |
particular voltages may still be inadequate for your needs, certainly so |
54 |
when running five drives and possibly a fully loaded RAM rack, plus |
55 |
multiiple fans. Put it this way, if you are using the power supply that |
56 |
came with the case, and you bought a low-end case, it's a fair bet that |
57 |
the rating on the power supply isn't worth the sticker it's printed on! |
58 |
|
59 |
Personally, I tend to prefer a pretty good power margin -- enough so it's |
60 |
not even close to stressing, and I've never had the trouble with power |
61 |
supplies I've seen others have. If I'm spending an extra $50 to $100 to |
62 |
have the peace of mind that stable power means, so be it! Tt's well worth |
63 |
it to me, considering the alternative of potential headaches and even |
64 |
damage to a system worth conservatively a couple grand. |
65 |
|
66 |
(I just need to learn to spend more on memory, as obviously, the generic |
67 |
stuff I was buying didn't cut it. <g> Lesson learned!) |
68 |
|
69 |
-- |
70 |
Duncan - List replies preferred. No HTML msgs. |
71 |
"Every nonfree program has a lord, a master -- |
72 |
and if you use the program, he is your master." Richard Stallman in |
73 |
http://www.linuxdevcenter.com/pub/a/linux/2004/12/22/rms_interview.html |
74 |
|
75 |
|
76 |
-- |
77 |
gentoo-amd64@g.o mailing list |