Gentoo Archives: gentoo-user

From: boy@×××××××××.xyz
To: gentoo-user@l.g.o
Subject: Re: [gentoo-user] [OT] Memory error messages in dmesg
Date: Mon, 30 Nov 2015 08:23:34
Message-Id: 565C0789.8080405@vanduuren.xyz
In Reply to: [gentoo-user] [OT] Memory error messages in dmesg by Dan Johansson
1 On 11/28/2015 11:01 AM, Dan Johansson wrote:
2 > I have started noticing the following messages in the dmesg output (and
3 > in the log-files) on my Gentoo rig:
4 >
5 > [46545.779803] [Hardware Error]: Corrected error, no action required.
6 > [46545.779984] [Hardware Error]: CPU:3 (15:2:0)
7 > MC2_STATUS[Over|CE|MiscV|-|AddrV|-|-|CECC]: 0xdc2540f000040136
8 > [46545.780434] [Hardware Error]: MC2 Error Address: 0x00000002cc215138
9 > [46545.780605] [Hardware Error]: MC2 Error: Fill ECC error on data fills.
10 > [46545.783764] [Hardware Error]: cache level: L2, tx: DATA, mem-tx: DRD
11 > [46545.784088] mce: [Hardware Error]: Machine check events logged
12
13 Are you using ECC memory? I saw the same errors when I just finished
14 building a machine that had some faulty ECC DIMMs installed.
15
16 > I have been running memtest for some time (~100h) and have not gotten
17 > any error message - so I am suspecting that this is a CPU problem. Am I
18 > correct?
19
20 In my case memtest didn't find any errors after a night of running
21 either, but when I'd boot Gentoo the errors would occur more frequently
22 the longer I was running or the more packages I had compiled.
23 I think the version of memtest I was running didn't take into account
24 error corrections, so for memtest every test succeeded even though the
25 memory had to use error corrections to make sure everything was
26 read/written properly.
27
28 > If it was just these error-messages I would not be that worried, but I
29 > have started to get a lot of "hangers" on this rig when compiling larger
30 > packages. Could there be a relation to the error-messages?
31
32 What I'd try to do is find the DIMM that's causing these errors and see
33 how your machine runs without it installed. I used EDAC [0] and
34 edac-utils [1] to find my faulty DIMMs.
35
36 - Boy
37
38 [0] https://www.kernel.org/doc/Documentation/edac.txt
39 [1] https://packages.gentoo.org/package/sys-apps/edac-utils

Attachments

File name MIME type
0x729527E4.asc application/pgp-keys
signature.asc application/pgp-signature