SMP system hard-halts, weird crashes - revisited... tests...
Date: Tue, 31 Dec 2002 02:27:13
Hello, folks,
For those of you who remember my threads from before regarding X halts,
SMP kernel halts, etc, etc... I think I was able to track down the issue,
with Riyad's kind guidance :)
It turned out to be an issue with my power supply, I believe.
I did, however, run a lot of tests, about which some of you might be
interested to know. First, I thought it could be my IDE drives or the IDE
controller. So, I put a good Seagate SCSI drive in my system, did a fresh
install of Gentoo, and tried several versions of the kernels -
2.4.19-gentoo-r7, 2.4.19-gentoo-r10, 2.4.20-vanilla, and
2.5.52-development. The crash was reproducible with either kernel and the
development kernel was no more unstable than the others. After
reproducing the crashes without any devices hooked up to IDE or Floppy
controllers, it was clear that it wasn't a controller or the drive
problem. I also tested with each kernel whether it was a hyperthreading
issue or not, and it wasn't. This may seem like obvious junk, but it does
take quite a bit of time to test all of these configs out. By the end of
the week (yes, a week), it was clear that it's either the power supply or
the board itself.
There was an idea from others that my nVidia GeForce3 card could be
conflicting with the Tyan board, or drawing too much power or something,
but I proved that untrue by reproducing all the crashes in the text-only
mode.
I use a Tyan Thunder i860 S2603 dual board with 2.2-gig Intel Xeons, and
this board has received very good reviews in all the reviews I've seen.
Actually, Tyan makes excellent boards... Period. Power supply was an
easier and less expensive (and a more probable) thing to try :) I had a
460-W Zippy Emacs (Taiwanese) supply in my box, which came from the guys
who sold me this machine. I decided to go for the best this time and
purchase Antec's True550 EPS12V power supply for dual Xeon boards. My
Tyan board requires a 24-pin main Molex power connector and an auxiliary
8-pin Molex power connector. An EPS supply accomodates the number of
pins, but as I found out when I hooked it up, the pins are all in the
wrong places!! So I plug it in, turn it on... Silence. A few mins
of Googling yield a big DUH as to why I didn't do the search before.
S2603 is a non-EPS board, which means it needs an EPS-to-nonEPS converter,
which is sold my Enhance Electronics (out of California,
Surely I could make one myself (I had schematics in
front of me), but most people who could have the tools (electrical
engineering department) were gone for Christmas! I didn't have any
Molex connectors or crimpers on hand, but Enhance Electronics gave a
nice schematic of the adapter, if anyone is curious. So I had to buy this
converter and now things look (and sound) pretty sweet. The system is up
and running. The problem seems to be gone, unless it was something else
(i.e. the board).
One more note for some of you who have run into such mysterious crashes
before. There's not a great deal of material about this on the net.
Apparently, these are mostly caused by low voltages or noise on the +5VSB
line from the power supply to the motherboard, which is a "standby"
voltage line. This certainly explains why my system wouldn't awaken from
sleeping in Windoze. :) This also explains why the system would halt
after performing a string of strenuous operations. I wonder why it halted
in the middle of strenuous operations, if it's really a VSB problem.
Maybe the power supply wasn't too good and the voltages would droop at
high load on other channels as well. I didn't check it with a meter.
No time.
It's never good to go cheap on power (or ram, or anything for that
matter). If one really doesn't want to replace a power supply, they can
put a capacitor between Common and +5VSB lines, which stabilizes the board
by eliminating such voltage droops and noise on the VSB line. The
capacitor I've seen used on the web was an electrolyte 6.3V rated 1000
micro-farad capacitor (although I wouldn't bank on it, it was hard to
tell from the photo what Farad units those were, but numbers were pretty
clear :)). The power cable extension with such a capacitor built-in is
sold by (JDResearch). They only sell this for the
usual 20-pin power connectors, not the 24-pin ones. I ordered it just to
see for myself which exactly capacitor is on it ;) THe principle is the
same for 24-pin lines, and one could trivially make this if they had a
soldering iron on hand and the right capacitor.
So, here are the pearls... Enjoy! :) This was a result of lots of
searches. So - if you get mysterious crashes and system halts that point
to other things than I/O devices, replace the power supply or try putting
a capacitor between +5VSB and Common to stabilize the board.
Gentoo rocks. :) Jeez, the installs are soo damn fast now that I have
half-the-clue as to what I'm doing in Gentoo :)
A brief note on the 2.5 dev kernel. It's real cool!! It compiles in a
flash, it loads in a flash, and I haven't run into any instabilities with
it yet!! It's absolutely blazing compared to 2.4.20 vanilla (or
2.4.19-gentoo, sorry :)) The only thing is, the nVidia kernel modules
don't compile with this kernel. The modules are now called *.ko rather
than *.o :)
Alright, this is all for now. Sorry for making this so long, but there's
so much to share. You all guru's have probably been thru most of this
already, so forgive me for insulting your intelligence with this, but it's
pretty exciting stuff to a novice like me!
All the best to everyone for the Holiday Season!
Denis
P.S. Riyad - Many thanks!! You rock!!
