1 |
>> >> >>>> I'm getting a lot of machine check exception errors in dmesg on my |
2 |
>> >> >>>> hosted server. Running mcelog I get: |
3 |
>> >> >>>> ... |
4 |
>> >> > |
5 |
>> >> > They offered to take my machine down and do a memory test which they |
6 |
>> >> > said would take a number of hours. Is a memory test likely to help? |
7 |
>> >> > Did you suggest reseating or replacing RAM modules as opposed to a |
8 |
>> >> > memory test because it will result in less downtime? |
9 |
>> >> |
10 |
>> >> I suspect that your hosting provider are offering you this memory test |
11 |
>> >> because they don't want to go swapping out memory modules willy-nilly. |
12 |
>> >> |
13 |
>> >> How do they know that the problem is really memory, and not your |
14 |
>> >> operating system? If they take all this RAM out and put new RAM in, |
15 |
>> >> what do they do with the old RAM? They don't know if it's good or bad, |
16 |
>> >> so are they expected to just slap it in a server belonging to another |
17 |
>> >> customer, and stitch him up? |
18 |
>> >> |
19 |
>> >> A memory test is likely to identify bad RAM, if it is bad, so you should |
20 |
>> >> proceed with this. This is likely the best route to solving the problem. |
21 |
>> >> |
22 |
>> >> I think that ideally, for you, they would move the system image onto a |
23 |
>> >> different known-good server with the same configuration. Then you cannot |
24 |
>> >> complain if the same problems start occurring again. If the problem is |
25 |
>> >> genuinely hardware then they won't. And the hosting provider is free to |
26 |
>> >> run diagnostics on your old machine. |
27 |
>> >> |
28 |
>> >> But realistically, the memory test is likely to show up a bad RAM |
29 |
>> >> module, you'll get it replaced and be up and running within a few |
30 |
>> >> hours. Why would you refuse? If your system needed a guaranteed uptime |
31 |
>> >> you'd perhaps have to pay for a higher level of service than the fees |
32 |
>> >> you're paying at present. |
33 |
>> > |
34 |
>> > I run memory tests overnight. If a module is seriously borked then it |
35 |
>> > will fail earlier. Reseating/replacing takes a few minutes, instead of |
36 |
>> > hours. |
37 |
>> > |
38 |
>> > If they have spare machines (for dev't or testing) they can fit the |
39 |
>> > memory module(s) there and test them exhaustively, before they put the |
40 |
>> > good ones back into a customer's machine. |
41 |
>> |
42 |
>> Thanks Mick and Stroller. I'll see if they'll go for this. |
43 |
> |
44 |
> You're welcome. Bear in mind though that a lot of hosters are just glorified |
45 |
> resellers with an account in a bigger data centre. In many cases they do not |
46 |
> even have physical access to the machines. Only the data centre techies do |
47 |
> and they may be less willing to oblige and break procedure or routine, just |
48 |
> because one end user out of hundreds/thousands complained about some memory |
49 |
> errors. |
50 |
|
51 |
Thanks Mick. My host is big with multiple data centers of their own. |
52 |
They did exactly as I asked and I'm running on new RAM. There was a |
53 |
problem bringing my system back online and the cause was purported to |
54 |
be an unseated ethernet cable. I handed over my root password as I |
55 |
was requested to do, and then started to get paranoid. I suppose I |
56 |
shouldn't though because with physical access to my machine they |
57 |
pretty much have full access anyway, right? |
58 |
|
59 |
- Grant |