Gentoo Archives: gentoo-user

From: Grant <emailgrant@×××××.com>
To: gentoo-user@l.g.o
Subject: Re: [gentoo-user] machine check exception errors
Date: Wed, 22 Sep 2010 16:42:53
Message-Id: AANLkTimDt3CghCWcgQwDXhoNRY6SMh3iR55saAk45Ogf@mail.gmail.com
In Reply to: Re: [gentoo-user] machine check exception errors by Mick
1 >> >> >>>> I'm getting a lot of machine check exception errors in dmesg on my
2 >> >> >>>> hosted server.  Running mcelog I get:
3 >> >> >>>> ...
4 >> >> >
5 >> >> > They offered to take my machine down and do a memory test which they
6 >> >> > said would take a number of hours.  Is a memory test likely to help?
7 >> >> > Did you suggest reseating or replacing RAM modules as opposed to a
8 >> >> > memory test because it will result in less downtime?
9 >> >>
10 >> >> I suspect that your hosting provider are offering you this memory test
11 >> >> because they don't want to go swapping out memory modules willy-nilly.
12 >> >>
13 >> >> How do they know that the problem is really memory, and not your
14 >> >> operating system? If they take all this RAM out and put new RAM in,
15 >> >> what do they do with the old RAM? They don't know if it's good or bad,
16 >> >> so are they expected to just slap it in a server belonging to another
17 >> >> customer, and stitch him up?
18 >> >>
19 >> >> A memory test is likely to identify bad RAM, if it is bad, so you should
20 >> >> proceed with this. This is likely the best route to solving the problem.
21 >> >>
22 >> >> I think that ideally, for you, they would move the system image onto a
23 >> >> different known-good server with the same configuration. Then you cannot
24 >> >> complain if the same problems start occurring again. If the problem is
25 >> >> genuinely hardware then they won't. And the hosting provider is free to
26 >> >> run diagnostics on your old machine.
27 >> >>
28 >> >> But realistically, the memory test is likely to show up a bad RAM
29 >> >> module, you'll get it replaced and be up and running within a few
30 >> >> hours. Why would you refuse? If your system needed a guaranteed uptime
31 >> >> you'd perhaps have to pay for a higher level of service than the fees
32 >> >> you're paying at present.
33 >> >
34 >> > I run memory tests overnight.  If a module is seriously borked then it
35 >> > will fail earlier.  Reseating/replacing takes a few minutes, instead of
36 >> > hours.
37 >> >
38 >> > If they have spare machines (for dev't or testing) they can fit the
39 >> > memory module(s) there and test them exhaustively, before they put the
40 >> > good ones back into a customer's machine.
41 >>
42 >> Thanks Mick and Stroller.  I'll see if they'll go for this.
43 >
44 > You're welcome.  Bear in mind though that a lot of hosters are just glorified
45 > resellers with an account in a bigger data centre.  In many cases they do not
46 > even have physical access to the machines.  Only the data centre techies do
47 > and they may be less willing to oblige and break procedure or routine, just
48 > because one end user out of hundreds/thousands complained about some memory
49 > errors.
50
51 Thanks Mick. My host is big with multiple data centers of their own.
52 They did exactly as I asked and I'm running on new RAM. There was a
53 problem bringing my system back online and the cause was purported to
54 be an unseated ethernet cable. I handed over my root password as I
55 was requested to do, and then started to get paranoid. I suppose I
56 shouldn't though because with physical access to my machine they
57 pretty much have full access anyway, right?
58
59 - Grant

Replies

Subject Author
Re: [gentoo-user] machine check exception errors Dale <rdalek1967@×××××.com>