Gentoo Archives: gentoo-user

From: Mick <michaelkintzios@×××××.com>
To: gentoo-user@l.g.o
Subject: Re: [gentoo-user] machine check exception errors
Date: Tue, 21 Sep 2010 21:33:15
Message-Id: 201009212233.05120.michaelkintzios@gmail.com
In Reply to: Re: [gentoo-user] machine check exception errors by Stroller
1 On Tuesday 21 September 2010 20:15:05 Stroller wrote:
2 > On 21 Sep 2010, at 18:37, Grant wrote:
3 > >>>> I'm getting a lot of machine check exception errors in dmesg on my
4 > >>>> hosted server. Running mcelog I get:
5 > >>>> ...
6 > >
7 > > They offered to take my machine down and do a memory test which they
8 > > said would take a number of hours. Is a memory test likely to help?
9 > > Did you suggest reseating or replacing RAM modules as opposed to a
10 > > memory test because it will result in less downtime?
11 >
12 > I suspect that your hosting provider are offering you this memory test
13 > because they don't want to go swapping out memory modules willy-nilly.
14 >
15 > How do they know that the problem is really memory, and not your operating
16 > system? If they take all this RAM out and put new RAM in, what do they do
17 > with the old RAM? They don't know if it's good or bad, so are they
18 > expected to just slap it in a server belonging to another customer, and
19 > stitch him up?
20 >
21 > A memory test is likely to identify bad RAM, if it is bad, so you should
22 > proceed with this. This is likely the best route to solving the problem.
23 >
24 > I think that ideally, for you, they would move the system image onto a
25 > different known-good server with the same configuration. Then you cannot
26 > complain if the same problems start occurring again. If the problem is
27 > genuinely hardware then they won't. And the hosting provider is free to
28 > run diagnostics on your old machine.
29 >
30 > But realistically, the memory test is likely to show up a bad RAM module,
31 > you'll get it replaced and be up and running within a few hours. Why would
32 > you refuse? If your system needed a guaranteed uptime you'd perhaps have
33 > to pay for a higher level of service than the fees you're paying at
34 > present.
35
36 I run memory tests overnight. If a module is seriously borked then it will
37 fail earlier. Reseating/replacing takes a few minutes, instead of hours.
38
39 If they have spare machines (for dev't or testing) they can fit the memory
40 module(s) there and test them exhaustively, before they put the good ones back
41 into a customer's machine.
42 --
43 Regards,
44 Mick

Attachments

File name MIME type
signature.asc application/pgp-signature

Replies

Subject Author
Re: [gentoo-user] machine check exception errors Grant <emailgrant@×××××.com>