1 |
On Tuesday 21 September 2010, Stroller wrote: |
2 |
> On 21 Sep 2010, at 18:37, Grant wrote: |
3 |
> >>>> I'm getting a lot of machine check exception errors in dmesg on my |
4 |
> >>>> hosted server. Running mcelog I get: |
5 |
> >>>> ... |
6 |
> > |
7 |
> > They offered to take my machine down and do a memory test which they |
8 |
> > said would take a number of hours. Is a memory test likely to help? |
9 |
> > Did you suggest reseating or replacing RAM modules as opposed to a |
10 |
> > memory test because it will result in less downtime? |
11 |
> |
12 |
> I suspect that your hosting provider are offering you this memory test |
13 |
> because they don't want to go swapping out memory modules willy-nilly. |
14 |
> |
15 |
> How do they know that the problem is really memory, and not your operating |
16 |
> system? If they take all this RAM out and put new RAM in, what do they do |
17 |
> with the old RAM? They don't know if it's good or bad, so are they |
18 |
> expected to just slap it in a server belonging to another customer, and |
19 |
> stitch him up? |
20 |
> |
21 |
> A memory test is likely to identify bad RAM, if it is bad, so you should |
22 |
> proceed with this. This is likely the best route to solving the problem. |
23 |
> |
24 |
|
25 |
sure? |
26 |
this is ecc ram - does memtest report ecc-corrected errors? i don't think so. |
27 |
The mce errors say: |
28 |
we detected an error. Error was corrected. Applications will not see error. |
29 |
Everything marches on. |
30 |
|
31 |
The ram is borked and must be replaced. |