Gentoo Archives: gentoo-dev

From: Josh Glover <jmglov@g.o>
To: Gentoo Dev <gentoo-dev@l.g.o>
Subject: Re: [gentoo-dev] Major MCE problem with SMP on Gentoo kernels
Date: Wed, 12 May 2004 02:42:30
Message-Id: 20040512024226.GB16857%jmglov@jmglov.net
In Reply to: [gentoo-dev] Major MCE problem with SMP on Gentoo kernels by Kevin
1 Quoth Kevin (Tue 2004-05-11 02:07:58PM -0400):
2
3 > In summary, my problem is this: of those that I've tried, I can't get any
4 > Gentoo kernel to handle SMP operation during major CPU activity (like
5 > emerging packages) for more than about 5 or 10 minutes. Invariably,
6 > during such activity, I get a kernel panic---most often with words on the
7 > console about Machine Check Exception 000000...004 (this number from
8 > memory so it may be off).
9
10 [...]
11
12 > The only way that I can get reliable, stable operation with a Gentoo
13 > kernel and distribution is if I build a kernel without support for SMP.
14
15 [...]
16
17 > The machine is a Dell PowerEdge1600SC with a PERC-3/SC SCSI RAID
18 > controller (using AMI megaraid2 driver) and a LSI Logic Corp controller
19 > (using Fusion MPT base driver) for the SCSI DAT and with dual 2.4GHz Xeon
20 > processors, each having a 512KB L2 Cache.
21
22 Running Gentoo with a 2.6.5 SMP kernel on a Dell PowerEdge 400SC:
23
24 : jmglov@jglover; uname -a
25 Linux jglover 2.6.5-gentoo-r1 #1 SMP Fri Apr 30 17:37:18 EDT 2004 i686 Intel(R) Pentium(R) 4 CPU 2.40GHz GenuineIntel GNU/Linux
26
27 > Output from /proc/cpuinfo is:
28
29 <snip>
30
31 : jmglov@jglover; cat /proc/cpuinfo
32 processor : 0
33 vendor_id : GenuineIntel
34 cpu family : 15
35 model : 2
36 model name : Intel(R) Pentium(R) 4 CPU 2.40GHz
37 stepping : 9
38 cpu MHz : 2395.027
39 cache size : 512 KB
40 physical id : 0
41 siblings : 2
42 fdiv_bug : no
43 hlt_bug : no
44 f00f_bug : no
45 coma_bug : no
46 fpu : yes
47 fpu_exception : yes
48 cpuid level : 2
49 wp : yes
50 flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe cid
51 bogomips : 4718.59
52
53 processor : 1
54 vendor_id : GenuineIntel
55 cpu family : 15
56 model : 2
57 model name : Intel(R) Pentium(R) 4 CPU 2.40GHz
58 stepping : 9
59 cpu MHz : 2395.027
60 cache size : 512 KB
61 physical id : 0
62 siblings : 2
63 fdiv_bug : no
64 hlt_bug : no
65 f00f_bug : no
66 coma_bug : no
67 fpu : yes
68 fpu_exception : yes
69 cpuid level : 2
70 wp : yes
71 flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe cid
72 bogomips : 4767.74
73
74 > I'm a recent Gentoo convert. I think it's an excellent improvement on the
75 > traditional Linux distros, and I'd really like to use it on my server,
76 > but as long as this problem with SMP is present, I just can't.
77
78 I really do not think it is a Gentoo issue. I have run Gentoo on quite a
79 few SMP boxen over the past several years, and never had problems like
80 you describe. Sounds like a hardware issue to me, unless you are using
81 (or were using) some really bogus CFLAGS.
82
83 > PS. FWIW, I'll add that I have a very vague memory while watching text fly
84 > up the screen during bootstrap.sh or emerge system (this was a stage 1
85 > install before I installed SuSE over it) of seeing some warning about
86 > something being unsafe with SMP. Do I need to have some setting or other
87 > turned off for some parts of a stage 1 install with a dual CPU system?
88
89 Nope.
90
91 Quoth Kevin (Tue, 11 May 2004 15:38:35 -0400):
92
93 > Ok. Thanks for the suggestion. But what about this: Dell has a utility
94 > partition and some programs for doing exhaustive testing of all the
95 > hardware in the server. If I run the most thorough set of tests
96 > available in this utility partition and I get a clean bill of health,
97 > is that a reliable indication that there are no hardware problems?
98
99 Nope. Tragically, it usually works the other way around: hardware test
100 suites are unlikely to give you a false positive, but if your hardware
101 passes, that does not mean you are safe. Your issue might be heat-
102 related, and your CPUs have to heat up for quite some time before they
103 choke. Combine this with some other issue (maybe you optimised a bit
104 aggressively when building your kernel?), and you have a tricky issue
105 for a hardware tester to catch.
106
107 Quoth Kevin (Tue, 11 May 2004 17:31:32 -0400):
108
109 > Honestly, I'm thinking that I may have somehow built some software
110 > (during the stage 1 installation process) that is causing these
111 > problems, but I followed the Gentoo Handbook for doing a stage 1
112 > installation pretty rigidly, so I'm not sure what I might have done to
113 > cause that.
114
115 Why did you do a Stage 1, just out of curiousity. I recommend doing at
116 least one Stage 1 install for newcomers to Gentoo, just for educational
117 purposes, but after that, go Stage 3 and use as many binary packages as
118 you can! There exists a Stage 3 tarball for your architecture--the
119 Pentium4 one, so why not use that, just to make sure your base system
120 is solid?
121
122 > When I did the bootstrap.sh and emerge system, I was
123 > running the kernel that I booted from the boot CD (2004.0 I think, and
124 > probably even the smp kernel that was on that CD---IIRC, the 2004.1
125 > boot CD has some problems that prevent the use of the smp kernel on
126 > that CD).
127
128 I don't remember that, but I cannot say for certain that I have tried
129 the 2004.1 universal x86 CD with the SMP kernel.
130
131 > Are there some compiler flags or other configurable settings that, if
132 > set to certain values during the bootstrap.sh or emerge system steps,
133 > could end up generating software (perhaps when I built my own gcc?)
134 > that would cause these MCEs to be thrown?
135
136 I dunno, why don't you post your CFLAGS and MAKEOPTS from your make.conf
137 here?
138
139 Quoth Kevin (Tue, 11 May 2004 17:37:39 -0400):
140
141 > On Tuesday 11 May 2004 15:38, Paul de Vrieze wrote:
142 >
143 >> What if you take the kernel from SUSE,
144 >
145 > I haven't tried installing Gentoo with my SuSE kernel running. Huh...
146 > what a concept. With all the modularity of those default distro
147 > kernels, would that even work? Maybe I'd need the kernel, the
148 > System.map, and the /lib/modules/`uname -r` directory?
149
150 Yes, you can install Gentoo while running *any* kernel. As long as you
151 can chroot, you can install Gentoo. See my Faketoo for an example:
152
153 http://forums.gentoo.org/viewtopic.php?p=1082580
154
155 Note that I do not actually build a kernel and setup the bootloader and
156 so forth, since I do not need to boot my jailed Gentoo installation--it
157 is just for ebuild development. However, nothing is stopping *you* from
158 doing it. :)
159
160 --
161 Josh Glover
162
163 GPG keyID 0xDE8A3103 (C3E4 FA9E 1E07 BBDB 6D8B 07AB 2BF1 67A1 DE8A 3103)
164 gpg --keyserver pgp.mit.edu --recv-keys DE8A3103

Replies

Subject Author
Re: [gentoo-dev] Major MCE problem with SMP on Gentoo kernels Dan Podeanu <pdan@×××××××××××.net>
Re: [gentoo-dev] Major MCE problem with SMP on Gentoo kernels Kevin <gentoo-dev@××××××.biz>