Gentoo Archives: gentoo-user

From: Mick <michaelkintzios@×××××.com>
To: gentoo-user@l.g.o
Subject: Re: [gentoo-user] machine check exception errors
Date: Wed, 22 Sep 2010 09:20:27
Message-Id: 201009221019.56794.michaelkintzios@gmail.com
In Reply to: Re: [gentoo-user] machine check exception errors by Grant
1 On Wednesday 22 September 2010 02:24:39 Grant wrote:
2 > >> >>>> I'm getting a lot of machine check exception errors in dmesg on my
3 > >> >>>> hosted server. Running mcelog I get:
4 > >> >>>> ...
5 > >> >
6 > >> > They offered to take my machine down and do a memory test which they
7 > >> > said would take a number of hours. Is a memory test likely to help?
8 > >> > Did you suggest reseating or replacing RAM modules as opposed to a
9 > >> > memory test because it will result in less downtime?
10 > >>
11 > >> I suspect that your hosting provider are offering you this memory test
12 > >> because they don't want to go swapping out memory modules willy-nilly.
13 > >>
14 > >> How do they know that the problem is really memory, and not your
15 > >> operating system? If they take all this RAM out and put new RAM in,
16 > >> what do they do with the old RAM? They don't know if it's good or bad,
17 > >> so are they expected to just slap it in a server belonging to another
18 > >> customer, and stitch him up?
19 > >>
20 > >> A memory test is likely to identify bad RAM, if it is bad, so you should
21 > >> proceed with this. This is likely the best route to solving the problem.
22 > >>
23 > >> I think that ideally, for you, they would move the system image onto a
24 > >> different known-good server with the same configuration. Then you cannot
25 > >> complain if the same problems start occurring again. If the problem is
26 > >> genuinely hardware then they won't. And the hosting provider is free to
27 > >> run diagnostics on your old machine.
28 > >>
29 > >> But realistically, the memory test is likely to show up a bad RAM
30 > >> module, you'll get it replaced and be up and running within a few
31 > >> hours. Why would you refuse? If your system needed a guaranteed uptime
32 > >> you'd perhaps have to pay for a higher level of service than the fees
33 > >> you're paying at present.
34 > >
35 > > I run memory tests overnight. If a module is seriously borked then it
36 > > will fail earlier. Reseating/replacing takes a few minutes, instead of
37 > > hours.
38 > >
39 > > If they have spare machines (for dev't or testing) they can fit the
40 > > memory module(s) there and test them exhaustively, before they put the
41 > > good ones back into a customer's machine.
42 >
43 > Thanks Mick and Stroller. I'll see if they'll go for this.
44
45 You're welcome. Bear in mind though that a lot of hosters are just glorified
46 resellers with an account in a bigger data centre. In many cases they do not
47 even have physical access to the machines. Only the data centre techies do
48 and they may be less willing to oblige and break procedure or routine, just
49 because one end user out of hundreds/thousands complained about some memory
50 errors.
51
52 YMMV
53 --
54 Regards,
55 Mick

Attachments

File name MIME type
signature.asc application/pgp-signature

Replies

Subject Author
Re: [gentoo-user] machine check exception errors Grant <emailgrant@×××××.com>