1 |
> On 2021-09-24, at 05:58, Philip Webb <purslow@××××××××.net> wrote: |
2 |
> |
3 |
> While I was asleep yesterday, my machine reported on all 3 Konsoles : |
4 |
> |
5 |
> Message from syslogd@ at Thu Sep 23 19:38:11 2021 ... |
6 |
> : mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 4: 9d0b4c16001d011b |
7 |
> |
8 |
> Message from syslogd@ at Thu Sep 23 19:38:11 2021 ... |
9 |
> : mce: [Hardware Error]: TSC 0 ADDR 19e617980 MISC c01a000001000000 |
10 |
> |
11 |
> Message from syslogd@ at Thu Sep 23 19:38:11 2021 ... |
12 |
> : mce: [Hardware Error]: PROCESSOR 2:600f20 TIME 1632440315 SOCKET 0 APIC 0 microcode 6000822 |
13 |
> |
14 |
> -- end of report -- |
15 |
> |
16 |
> I don't remember seeing this before : how concerned should I be ? |
17 |
|
18 |
From the manpage: |
19 |
|
20 |
Most errors can be corrected by the CPU by internal error correction mechanisms. Uncorrected |
21 |
errors cause machine check exceptions which may kill processes or panic the machine. A small |
22 |
number of corrected errors is usually not a cause for worry, but a large number can indicate |
23 |
future failure. |
24 |
|
25 |
When an uncorrected machine check error happens that the kernel cannot recover from then it |
26 |
will usually panic the system. In this case when there was a warm reset after the panic |
27 |
mcelog should pick up the machine check errors after reboot. This is not possible after a |
28 |
cold reset. |
29 |
|
30 |
If you are overclocking, try disabling it. |