Gentoo Archives: gentoo-user

From: Sebas Pedersen <sebasped@×××××××××××.org>
To: michaelkintzios@×××××.com
Cc: gentoo-user@l.g.o
Subject: Re: [gentoo-user] MCE error
Date: Sun, 29 Mar 2015 15:52:44
Message-Id: b0468b3248cbe6aae5be96a93c96e1a3@openmailbox.org
In Reply to: Re: [gentoo-user] MCE error by Mick
1 On 29-03-2015 12:45 PM, Mick wrote:
2 > On Sunday 29 Mar 2015 16:42:10 Sebas Pedersen wrote:
3 >> On 28-03-2015 08:50 PM, Mick wrote:
4 >> > On Saturday 28 Mar 2015 22:48:48 Sebas Pedersen wrote:
5 >> >> On 28-03-2015 07:37 PM, Volker Armin Hemmann wrote:
6 >> >> > Am 28.03.2015 um 23:00 schrieb Sebas Pedersen:
7 >> >> >> On 28-03-2015 06:45 PM, Volker Armin Hemmann wrote:
8 >> >> >>> Am 28.03.2015 um 14:58 schrieb Sebas Pedersen:
9 >> >> >>>> Hi guys,
10 >> >> >>>>
11 >> >> >>>> From a few days ago I am experimenting an MCE error.
12 >> >> >>>> Sometimes I turn on the computer and at some point while booting
13 >> >> >>>> the kernel (after the grub menu) just freezes and puts this:
14 >> >> >>>>
15 >> >> >>>> CPU 0: Machine Check Exception: 4 Bank 4: b200000000070f0f
16 >> >> >>>> TSC f5acc9180
17 >> >> >>>> PROCESSOR 2:20fc2 TIME 1427486735 SOCKET 0 APIC 0 microcode 0
18 >> >> >>>>
19 >> >> >>>> the number for TSC may vary, but the b200000000070f0f it's always
20 >> >> >>>> the
21 >> >> >>>> same (at least for now). The error message suggest to parse the
22 >> >> >>>> above
23 >> >> >>>> error with mcelog. I did that and the result was:
24 >> >> >>>>
25 >> >> >>>> Hardware event. This is not a software error.
26 >> >> >>>> CPU 0 4 northbridge TSC f5acc9180
27 >> >> >>>> TIME 1427486735 Fri Mar 27 17:05:35 2015
28 >> >> >>>>
29 >> >> >>>> Northbridge Watchdog error
30 >> >> >>>>
31 >> >> >>>> bit57 = processor context corrupt
32 >> >> >>>> bit61 = error uncorrected
33 >> >> >>>>
34 >> >> >>>> bus error 'generic participation, request timed out
35 >> >> >>>>
36 >> >> >>>> generic error mem transaction
37 >> >> >>>> generic access, level generic'
38 >> >> >>>>
39 >> >> >>>> STATUS b200000000070f0f MCGSTATUS 4
40 >> >> >>>> CPUID Vendor AMD Family 15 Model 44
41 >> >> >>>> SOCKET 0 APIC 0 microcode 0
42 >> >> >>>>
43 >> >> >>>> The error suggest it's a hardware problem. I replace de RAM with no
44 >> >> >>>> luck. Same error keeps happening.
45 >> >> >>>>
46 >> >> >>>> Any suggestion for identifying the problem or how to procede?
47 >> >> >>>>
48 >> >> >>>> Many thanks in advance!
49 >> >> >>>>
50 >> >> >>>> Sebas
51 >> >> >>>
52 >> >> >>> bios update/microcode update. A google search suggests that you have
53 >> >> >>> run
54 >> >> >>> into an errata.
55 >> >> >>
56 >> >> >> Oh OK, thank you. Must have miss that in the search. So you are
57 >> >> >> saying that the error comes from a bios errata (and don't know what
58 >> >> >> microdode is), and the fix is to update bios?
59 >> >> >
60 >> >> > no, possibly from a CPU errata and a bios update might bring in the
61 >> >> > microcode update that works around that.
62 >> >>
63 >> >> I see, thanks for clarifying that. So looks like not too many options,
64 >> >> either try to update the bios and/or replace the CPU.
65 >> >>
66 >> >> I really appreciated you replys and time.
67 >> >>
68 >> >> Thanks!,
69 >> >> Sebas
70 >> >
71 >> > There's 'CONFIG_MICROCODE=y' and friends in the kernel which along with
72 >> > sys-
73 >> > apps/microcode-ctl will load what ever is the latest Intel/AMD CPU code
74 >> > (firmware) to patch any bugs with instructions that the CPU
75 >> > manufacturers have
76 >> > discovered.
77 >>
78 >> That's nice. I'm gonna compile the kernel and see what happends.
79 >>
80 >> Many thanks!
81 >
82 > Don't forget to enable the relevant module for your type of CPU.
83
84 You're right. Thanks for the reminder!
85
86 Best Regards,
87 Sebas