1 |
On Tuesday 21 September 2010 20:15:05 Stroller wrote: |
2 |
> On 21 Sep 2010, at 18:37, Grant wrote: |
3 |
> >>>> I'm getting a lot of machine check exception errors in dmesg on my |
4 |
> >>>> hosted server. Running mcelog I get: |
5 |
> >>>> ... |
6 |
> > |
7 |
> > They offered to take my machine down and do a memory test which they |
8 |
> > said would take a number of hours. Is a memory test likely to help? |
9 |
> > Did you suggest reseating or replacing RAM modules as opposed to a |
10 |
> > memory test because it will result in less downtime? |
11 |
> |
12 |
> I suspect that your hosting provider are offering you this memory test |
13 |
> because they don't want to go swapping out memory modules willy-nilly. |
14 |
> |
15 |
> How do they know that the problem is really memory, and not your operating |
16 |
> system? If they take all this RAM out and put new RAM in, what do they do |
17 |
> with the old RAM? They don't know if it's good or bad, so are they |
18 |
> expected to just slap it in a server belonging to another customer, and |
19 |
> stitch him up? |
20 |
> |
21 |
> A memory test is likely to identify bad RAM, if it is bad, so you should |
22 |
> proceed with this. This is likely the best route to solving the problem. |
23 |
> |
24 |
> I think that ideally, for you, they would move the system image onto a |
25 |
> different known-good server with the same configuration. Then you cannot |
26 |
> complain if the same problems start occurring again. If the problem is |
27 |
> genuinely hardware then they won't. And the hosting provider is free to |
28 |
> run diagnostics on your old machine. |
29 |
> |
30 |
> But realistically, the memory test is likely to show up a bad RAM module, |
31 |
> you'll get it replaced and be up and running within a few hours. Why would |
32 |
> you refuse? If your system needed a guaranteed uptime you'd perhaps have |
33 |
> to pay for a higher level of service than the fees you're paying at |
34 |
> present. |
35 |
|
36 |
I run memory tests overnight. If a module is seriously borked then it will |
37 |
fail earlier. Reseating/replacing takes a few minutes, instead of hours. |
38 |
|
39 |
If they have spare machines (for dev't or testing) they can fit the memory |
40 |
module(s) there and test them exhaustively, before they put the good ones back |
41 |
into a customer's machine. |
42 |
-- |
43 |
Regards, |
44 |
Mick |