1 |
>> >>>> I'm getting a lot of machine check exception errors in dmesg on my |
2 |
>> >>>> hosted server. Running mcelog I get: |
3 |
>> >>>> ... |
4 |
>> > |
5 |
>> > They offered to take my machine down and do a memory test which they |
6 |
>> > said would take a number of hours. Is a memory test likely to help? |
7 |
>> > Did you suggest reseating or replacing RAM modules as opposed to a |
8 |
>> > memory test because it will result in less downtime? |
9 |
>> |
10 |
>> I suspect that your hosting provider are offering you this memory test |
11 |
>> because they don't want to go swapping out memory modules willy-nilly. |
12 |
>> |
13 |
>> How do they know that the problem is really memory, and not your operating |
14 |
>> system? If they take all this RAM out and put new RAM in, what do they do |
15 |
>> with the old RAM? They don't know if it's good or bad, so are they |
16 |
>> expected to just slap it in a server belonging to another customer, and |
17 |
>> stitch him up? |
18 |
>> |
19 |
>> A memory test is likely to identify bad RAM, if it is bad, so you should |
20 |
>> proceed with this. This is likely the best route to solving the problem. |
21 |
>> |
22 |
>> I think that ideally, for you, they would move the system image onto a |
23 |
>> different known-good server with the same configuration. Then you cannot |
24 |
>> complain if the same problems start occurring again. If the problem is |
25 |
>> genuinely hardware then they won't. And the hosting provider is free to |
26 |
>> run diagnostics on your old machine. |
27 |
>> |
28 |
>> But realistically, the memory test is likely to show up a bad RAM module, |
29 |
>> you'll get it replaced and be up and running within a few hours. Why would |
30 |
>> you refuse? If your system needed a guaranteed uptime you'd perhaps have |
31 |
>> to pay for a higher level of service than the fees you're paying at |
32 |
>> present. |
33 |
> |
34 |
> I run memory tests overnight. If a module is seriously borked then it will |
35 |
> fail earlier. Reseating/replacing takes a few minutes, instead of hours. |
36 |
> |
37 |
> If they have spare machines (for dev't or testing) they can fit the memory |
38 |
> module(s) there and test them exhaustively, before they put the good ones back |
39 |
> into a customer's machine. |
40 |
|
41 |
Thanks Mick and Stroller. I'll see if they'll go for this. |
42 |
|
43 |
- Grant |