Gentoo Archives: gentoo-dev

From: Kevin <gentoo-dev@××××××.biz>
To: Gentoo Dev <gentoo-dev@l.g.o>
Subject: [gentoo-dev] Major MCE problem with SMP on Gentoo kernels
Date: Tue, 11 May 2004 18:08:42
Message-Id: 200405111407.58909.gentoo-dev@gnosys.biz
1 Hi All-
2
3 I'm writing here first before reporting a bug because perhaps I'm missing
4 something important here (and because I'm not sure what details to supply
5 if I do report a bug because I'm not sure if the problem lies with the
6 gentoo kernels or with gcc or something else). If I am missing
7 something, however, I'm not the only Gentoo user who's missing it, so I
8 think that's unlikely. I saw a thread on lkml in March from somebody
9 else with extremely similar circumstances---though not identical---and
10 running Gentoo---he thought it was a kernel bug but I don't think so:
11 see
12 http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&oe=ISO-8859-1&threadm=1yyJD-8mD-11%
13 40gated-at.bofh.it&rnum=6&prev=/groups%3Fq%3Dgroup:linux.kernel%2Bsmp%
14 2Bgentoo%26hl%3Den%26lr%3D%26ie%3DUTF-8%26oe%3DISO-8859-1%26sa%3DG%
15 26scoring%3Dd
16
17 or search for "group:linux.kernel smp gentoo" on google groups,
18
19 or see lkml thread: SMP + Hyperthreading / Asus PCDL Deluxe / Kernel 2.4.x
20 2.6.x / Crash/Freeze).
21
22 Instead, I think the most likely explanation for my problem is a bug in
23 some Gentoo code somewhere, perhaps related to building kernels, but
24 maybe not... Maybe related to building gcc itself? Not sure.
25
26 In summary, my problem is this: of those that I've tried, I can't get any
27 Gentoo kernel to handle SMP operation during major CPU activity (like
28 emerging packages) for more than about 5 or 10 minutes. Invariably,
29 during such activity, I get a kernel panic---most often with words on the
30 console about Machine Check Exception 000000...004 (this number from
31 memory so it may be off).
32
33 The only way that I can get reliable, stable operation with a Gentoo
34 kernel and distribution is if I build a kernel without support for SMP.
35 This is stable with or without hyperthreading enabled in CMOS. Over the
36 last week or so, I've tried running kernels with the CMOS setting for
37 hyperthreading disabled and enabled, with support for SMP enabled and
38 disabled, in all combinations and for the latest stable ebuilds of the
39 following Gentoo kernels: vanilla-sources, gentoo-sources,
40 gentoo-dev-sources, gs-sources (actually, I couldn't even get this to
41 build---see bug #48973 and thread here: gs-sources problems: device
42 mapper: dm.o has undeclared identifiers). Going by memory, I tried
43 kernel versions 2.4.25 (vanilla?), 2.4.26 (gentoo?), and 2.6.5
44 (gentoo-dev?).
45
46 In all of the above circumstances, when running a kernel with support for
47 SMP and when emerging packages (some pretty small ones, so only about 3
48 or 5 minutes of compiling), the machine would lock with a kernel panic
49 and need a hard reset.
50
51 The machine is a Dell PowerEdge1600SC with a PERC-3/SC SCSI RAID
52 controller (using AMI megaraid2 driver) and a LSI Logic Corp controller
53 (using Fusion MPT base driver) for the SCSI DAT and with dual 2.4GHz Xeon
54 processors, each having a 512KB L2 Cache.
55
56 Output from /proc/cpuinfo is:
57 =======
58 processor : 0
59 vendor_id : GenuineIntel
60 cpu family : 15
61 model : 2
62 model name : Intel(R) Xeon(TM) CPU 2.40GHz
63 stepping : 7
64 cpu MHz : 2392.127
65 cache size : 512 KB
66 fdiv_bug : no
67 hlt_bug : no
68 f00f_bug : no
69 coma_bug : no
70 fpu : yes
71 fpu_exception : yes
72 cpuid level : 2
73 wp : yes
74 flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
75 mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe cid
76 bogomips : 4771.02
77
78 processor : 1
79 vendor_id : GenuineIntel
80 cpu family : 15
81 model : 2
82 model name : Intel(R) Xeon(TM) CPU 2.40GHz
83 stepping : 7
84 cpu MHz : 2392.127
85 cache size : 512 KB
86 fdiv_bug : no
87 hlt_bug : no
88 f00f_bug : no
89 coma_bug : no
90 fpu : yes
91 fpu_exception : yes
92 cpuid level : 2
93 wp : yes
94 flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
95 mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe cid
96 bogomips : 4771.02
97
98 processor : 2
99 vendor_id : GenuineIntel
100 cpu family : 15
101 model : 2
102 model name : Intel(R) Xeon(TM) CPU 2.40GHz
103 stepping : 9
104 cpu MHz : 2392.127
105 cache size : 512 KB
106 fdiv_bug : no
107 hlt_bug : no
108 f00f_bug : no
109 coma_bug : no
110 fpu : yes
111 fpu_exception : yes
112 cpuid level : 2
113 wp : yes
114 flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
115 mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe cid
116 bogomips : 4771.02
117
118 processor : 3
119 vendor_id : GenuineIntel
120 cpu family : 15
121 model : 2
122 model name : Intel(R) Xeon(TM) CPU 2.40GHz
123 stepping : 9
124 cpu MHz : 2392.127
125 cache size : 512 KB
126 fdiv_bug : no
127 hlt_bug : no
128 f00f_bug : no
129 coma_bug : no
130 fpu : yes
131 fpu_exception : yes
132 cpuid level : 2
133 wp : yes
134 flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
135 mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe cid
136 bogomips : 4771.02
137 =======
138
139 I wrote some details about this problem in gentoo-user under the thread:
140 2004.1 and SMP Problems, but since then, have done lots more testing.
141
142 The reason that I think this is a Gentoo thing and not a kernel thing is
143 that today I just finished installing SuSE9 on this same machine with
144 CMOS hyperthreading setting enabled and the CPUs have been wailing away
145 for hours doing simultaneous builds of several different source tarballs
146 (bind9, kde3.2.2, mysql 4.0.18), and I haven't seen even a single
147 problem. During these tests, I was running the SuSE kernel
148 2.4.21-215-smp4G.
149
150 In SuSE, the output of /proc/cpuinfo is close, but not exactly the same as
151 above. There are some differences in the flags and a couple other things
152 (use diff for specifics).
153
154 SuSE /proc/cpuinfo:
155
156 =======
157 processor : 0
158 vendor_id : GenuineIntel
159 cpu family : 15
160 model : 2
161 model name : Intel(R) Xeon(TM) CPU 2.40GHz
162 stepping : 7
163 cpu MHz : 2392.795
164 cache size : 512 KB
165 physical id : 0
166 siblings : 2
167 fdiv_bug : no
168 hlt_bug : no
169 f00f_bug : no
170 coma_bug : no
171 fpu : yes
172 fpu_exception : yes
173 cpuid level : 2
174 wp : yes
175 flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
176 mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm
177 bogomips : 4718.59
178
179 processor : 1
180 vendor_id : GenuineIntel
181 cpu family : 15
182 model : 2
183 model name : Intel(R) Xeon(TM) CPU 2.40GHz
184 stepping : 7
185 cpu MHz : 2392.795
186 cache size : 512 KB
187 physical id : 0
188 siblings : 2
189 fdiv_bug : no
190 hlt_bug : no
191 f00f_bug : no
192 coma_bug : no
193 fpu : yes
194 fpu_exception : yes
195 cpuid level : 2
196 wp : yes
197 flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
198 mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm
199 bogomips : 4767.74
200
201 processor : 2
202 vendor_id : GenuineIntel
203 cpu family : 15
204 model : 2
205 model name : Intel(R) Xeon(TM) CPU 2.40GHz
206 stepping : 9
207 cpu MHz : 2392.795
208 cache size : 512 KB
209 physical id : 2
210 siblings : 2
211 fdiv_bug : no
212 hlt_bug : no
213 f00f_bug : no
214 coma_bug : no
215 fpu : yes
216 fpu_exception : yes
217 cpuid level : 2
218 wp : yes
219 flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
220 mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm
221 bogomips : 4767.74
222
223 processor : 3
224 vendor_id : GenuineIntel
225 cpu family : 15
226 model : 2
227 model name : Intel(R) Xeon(TM) CPU 2.40GHz
228 stepping : 9
229 cpu MHz : 2392.795
230 cache size : 512 KB
231 physical id : 2
232 siblings : 2
233 fdiv_bug : no
234 hlt_bug : no
235 f00f_bug : no
236 coma_bug : no
237 fpu : yes
238 fpu_exception : yes
239 cpuid level : 2
240 wp : yes
241 flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
242 mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm
243 bogomips : 4767.74
244 =======
245
246 Gentoo emerge info output:
247 =======
248 System uname: 2.4.25-gentoo-r2 i686 Intel(R) Xeon(TM) CPU 2.40GHz
249 Gentoo Base System version 1.4.9
250 distcc 2.13 i686-pc-linux-gnu (protocols 1 and 2) (default port 3632)
251 [enabled]
252 ccache version 2.3 [enabled]
253 Autoconf: sys-devel/autoconf-2.58-r1
254 Automake: sys-devel/automake-1.8.3
255 ACCEPT_KEYWORDS="x86"
256 AUTOCLEAN="yes"
257 CFLAGS="-O3 -march=pentium4 -pipe -fomit-frame-pointer"
258 CHOST="i686-pc-linux-gnu"
259 COMPILER="gcc3"
260 CONFIG_PROTECT="/etc /usr/X11R6/lib/X11/xkb /usr/kde/2/share/config /usr/kde/3.2/share/config /usr/kde/3/share/config /usr/lib/mozilla/defaults/pref /usr/share/config /usr/share/texmf/dvipdfm/config/ /usr/share/texmf/dvips/config/ /usr/share/texmf/tex/generic/config/ /usr/share/texmf/tex/platex/config/ /usr/share/texmf/xdvi/ /var/qmail/control"
261 CONFIG_PROTECT_MASK="/etc/afs/C /etc/afs/afsws /etc/gconf /etc/terminfo /etc/env.d"
262 CXXFLAGS="-O3 -march=pentium4 -pipe -fomit-frame-pointer"
263 DISTDIR="/usr/portage/distfiles"
264 FEATURES="autoaddcvs ccache distcc sandbox"
265 GENTOO_MIRRORS="http://128.213.5.34/gentoo/
266 http://mirror.datapipe.net/gentoo
267 ftp://mirrors.sec.informatik.tu-darmstadt.de/gentoo/
268 http://gentoo.eliteitminds.com"
269 MAKEOPTS="-j3"
270 PKGDIR="/usr/portage/packages"
271 PORTAGE_TMPDIR="/var/tmp"
272 PORTDIR="/usr/portage"
273 PORTDIR_OVERLAY=""
274 SYNC="rsync://rsync.namerica.gentoo.org/gentoo-portage"
275 USE="X Xaw3d acl acpi afs alsa apache2 apm arts avi berkdb bonobo caps
276 crypt
277 cups doc emacs emacs-w3 encode esd ethereal evo firebird flac foomaticdb
278 gdbm
279 gif gnome gpm gstreamer gtk gtk2 gtkhtml guile hardened icq imagemagick
280 imap
281 imlib innodb ipv6 jabber jack java jikes jpeg kde kerberos krb4 ldap
282 libg++
283 libwww mad mcal mikmod motif mozilla mpeg mysql ncurses nls odbc oggvorbis
284 opengl oss pam pda pdflib perl plotutils png ppds prelude python qt
285 quicktime
286 readline ruby samba sasl sdl slang slp spell sse ssl svga tcltk tcpd tetex
287 tiff
288 truetype unicode usb vhosts x86 xinerama xml2 xmms xv zeo zlib"
289 =======
290
291 I'm a recent Gentoo convert. I think it's an excellent improvement on the
292 traditional Linux distros, and I'd really like to use it on my server,
293 but as long as this problem with SMP is present, I just can't.
294
295 If anyone has any suggestions on what I might be doing wrong and how I can
296 get a stable gentoo system with full support for SMP (ideally in a 2.4.x
297 kernel since I need OpenAFS and OpenAFS doesn't work with 2.6.x right
298 now; nor for the near future say the developers of OAFS), I would really
299 appreciate getting your thoughts.
300
301 Thanks in advance.
302
303 --
304 -Kevin
305
306 PS. FWIW, I'll add that I have a very vague memory while watching text fly
307 up the screen during bootstrap.sh or emerge system (this was a stage 1
308 install before I installed SuSE over it) of seeing some warning about
309 something being unsafe with SMP. Do I need to have some setting or other
310 turned off for some parts of a stage 1 install with a dual CPU system?
311
312
313 --
314 gentoo-dev@g.o mailing list

Replies