Gentoo Archives: gentoo-user

From: Grant <emailgrant@×××××.com>
To: gentoo-user@l.g.o
Subject: Re: [gentoo-user] machine check exception errors
Date: Wed, 22 Sep 2010 01:25:07
Message-Id: AANLkTinnoW3JJhA+FeysR-p6uytcx-TMj=P3uT5-9kzy@mail.gmail.com
In Reply to: Re: [gentoo-user] machine check exception errors by Mick
1 >> >>>> I'm getting a lot of machine check exception errors in dmesg on my
2 >> >>>> hosted server.  Running mcelog I get:
3 >> >>>> ...
4 >> >
5 >> > They offered to take my machine down and do a memory test which they
6 >> > said would take a number of hours.  Is a memory test likely to help?
7 >> > Did you suggest reseating or replacing RAM modules as opposed to a
8 >> > memory test because it will result in less downtime?
9 >>
10 >> I suspect that your hosting provider are offering you this memory test
11 >> because they don't want to go swapping out memory modules willy-nilly.
12 >>
13 >> How do they know that the problem is really memory, and not your operating
14 >> system? If they take all this RAM out and put new RAM in, what do they do
15 >> with the old RAM? They don't know if it's good or bad, so are they
16 >> expected to just slap it in a server belonging to another customer, and
17 >> stitch him up?
18 >>
19 >> A memory test is likely to identify bad RAM, if it is bad, so you should
20 >> proceed with this. This is likely the best route to solving the problem.
21 >>
22 >> I think that ideally, for you, they would move the system image onto a
23 >> different known-good server with the same configuration. Then you cannot
24 >> complain if the same problems start occurring again. If the problem is
25 >> genuinely hardware then they won't. And the hosting provider is free to
26 >> run diagnostics on your old machine.
27 >>
28 >> But realistically, the memory test is likely to show up a bad RAM module,
29 >> you'll get it replaced and be up and running within a few hours. Why would
30 >> you refuse? If your system needed a guaranteed uptime you'd perhaps have
31 >> to pay for a higher level of service than the fees you're paying at
32 >> present.
33 >
34 > I run memory tests overnight.  If a module is seriously borked then it will
35 > fail earlier.  Reseating/replacing takes a few minutes, instead of hours.
36 >
37 > If they have spare machines (for dev't or testing) they can fit the memory
38 > module(s) there and test them exhaustively, before they put the good ones back
39 > into a customer's machine.
40
41 Thanks Mick and Stroller. I'll see if they'll go for this.
42
43 - Grant

Replies

Subject Author
Re: [gentoo-user] machine check exception errors Mick <michaelkintzios@×××××.com>