Gentoo Archives: gentoo-dev

From: Kevin <gentoo-dev@××××××.biz>
To: gentoo-dev@l.g.o
Subject: Re: [gentoo-dev] Major MCE problem with SMP on Gentoo kernels
Date: Tue, 11 May 2004 21:31:34
Message-Id: 200405111731.32365.gentoo-dev@gnosys.biz
In Reply to: Re: [gentoo-dev] Major MCE problem with SMP on Gentoo kernels by Chris Gianelloni
1 On Tuesday 11 May 2004 16:54, Chris Gianelloni wrote:
2 > On Tue, 2004-05-11 at 15:38, Kevin wrote:
3 > > Ok. Thanks for the suggestion. But what about this: Dell has a
4 > > utility partition and some programs for doing exhaustive testing of
5 > > all the hardware in the server. If I run the most thorough set of
6 > > tests available in this utility partition and I get a clean bill of
7 > > health, is that a reliable indication that there are no hardware
8 > > problems? Or does memtest86 do testing that's more exhaustive than
9 > > most such utility suites?
10 >
11 > I think the Dell suite would be more extensive.
12
13 Thanks for saying so, Chris.
14
15 >
16 > > If the utility partition testing says all is well (I've done it
17 > > several times in the last month or so, though maybe not the most
18 > > extensive tests), what's the next place to look for an explanation
19 > > of why this MCE is happening in Gentoo but not in SuSE?
20 >
21 > Are you sure that it isn't MCE *causing* these problems? Have you
22 > tried turning it off and seeing if you still have the same kinds of
23 > problems?
24
25 I'm not sure I understand what you mean by that. The first time I got a
26 kernel panic and MCE, I believe that the kernel I was running had no
27 configured capability to deal with MCE errors (though I'm not sure of
28 that). I had never seen an MCE before, but after this first time, with
29 any other kernels I built, I searched through the .config file options
30 for handlers of MCE errors and built them into the kernel where they
31 were available. IIRC, then when I got a kernel panic with those
32 kernels, I had some more information (apparently generated by the
33 kernel) on the console than I did with the first MCE. I add this
34 information in case it relates to your question or point here, but I'm
35 really not sure what you mean by, "Have you tried turning it off..."
36 Where do I turn it off? Do you mean the .config file parameter in the
37 kernel configuration process that builds (or not) a handler for the MCE
38 errors? Or do you mean something else?
39
40 Honestly, I'm thinking that I may have somehow built some software
41 (during the stage 1 installation process) that is causing these
42 problems, but I followed the Gentoo Handbook for doing a stage 1
43 installation pretty rigidly, so I'm not sure what I might have done to
44 cause that. When I did the bootstrap.sh and emerge system, I was
45 running the kernel that I booted from the boot CD (2004.0 I think, and
46 probably even the smp kernel that was on that CD---IIRC, the 2004.1
47 boot CD has some problems that prevent the use of the smp kernel on
48 that CD).
49
50 In fact, now that I think of it, I'm pretty sure I didn't get any MCE
51 kernel panics until after I finished emerge system and other tasks and
52 then rebooted my new Gentoo system. Perhaps this helps isolate the
53 cause of the problems. While I was doing the bootstrap.sh and emerge
54 system, it's definitely true that I was stressing the system out with
55 lots of compile jobs (which is what has been triggering my MCEs), but
56 I'm pretty sure I did not get any MCE failures during those steps.
57 Does this help someone figure out what's going on in my case?
58
59 Are there some compiler flags or other configurable settings that, if
60 set to certain values during the bootstrap.sh or emerge system steps,
61 could end up generating software (perhaps when I built my own gcc?)
62 that would cause these MCEs to be thrown?
63
64 Like I said in my PS in my first post, I have this vague memory of
65 seeing something that said, such-and-such is not smp safe. Have no
66 clue what that might have been now, though, or even if it's an accurate
67 memory. Some of this work was done in the wee hours...
68
69 Thanks for the replies and any other suggestions.
70
71 -Kevin
72
73
74 --
75 gentoo-dev@g.o mailing list