Gentoo Archives: gentoo-user

From: Volker Armin Hemmann <volkerarmin@××××××××××.com>
To: gentoo-user@l.g.o
Subject: Re: [gentoo-user] machine check exception errors
Date: Sun, 26 Sep 2010 00:09:57
Message-Id: 201009260120.03590.volkerarmin@googlemail.com
In Reply to: Re: [gentoo-user] machine check exception errors by Stroller
1 On Tuesday 21 September 2010, Stroller wrote:
2 > On 21 Sep 2010, at 18:37, Grant wrote:
3 > >>>> I'm getting a lot of machine check exception errors in dmesg on my
4 > >>>> hosted server. Running mcelog I get:
5 > >>>> ...
6 > >
7 > > They offered to take my machine down and do a memory test which they
8 > > said would take a number of hours. Is a memory test likely to help?
9 > > Did you suggest reseating or replacing RAM modules as opposed to a
10 > > memory test because it will result in less downtime?
11 >
12 > I suspect that your hosting provider are offering you this memory test
13 > because they don't want to go swapping out memory modules willy-nilly.
14 >
15 > How do they know that the problem is really memory, and not your operating
16 > system? If they take all this RAM out and put new RAM in, what do they do
17 > with the old RAM? They don't know if it's good or bad, so are they
18 > expected to just slap it in a server belonging to another customer, and
19 > stitch him up?
20 >
21 > A memory test is likely to identify bad RAM, if it is bad, so you should
22 > proceed with this. This is likely the best route to solving the problem.
23 >
24
25 sure?
26 this is ecc ram - does memtest report ecc-corrected errors? i don't think so.
27 The mce errors say:
28 we detected an error. Error was corrected. Applications will not see error.
29 Everything marches on.
30
31 The ram is borked and must be replaced.