1 |
Hi All- |
2 |
|
3 |
I'm writing here first before reporting a bug because perhaps I'm missing |
4 |
something important here (and because I'm not sure what details to supply |
5 |
if I do report a bug because I'm not sure if the problem lies with the |
6 |
gentoo kernels or with gcc or something else). If I am missing |
7 |
something, however, I'm not the only Gentoo user who's missing it, so I |
8 |
think that's unlikely. I saw a thread on lkml in March from somebody |
9 |
else with extremely similar circumstances---though not identical---and |
10 |
running Gentoo---he thought it was a kernel bug but I don't think so: |
11 |
see |
12 |
http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&oe=ISO-8859-1&threadm=1yyJD-8mD-11% |
13 |
40gated-at.bofh.it&rnum=6&prev=/groups%3Fq%3Dgroup:linux.kernel%2Bsmp% |
14 |
2Bgentoo%26hl%3Den%26lr%3D%26ie%3DUTF-8%26oe%3DISO-8859-1%26sa%3DG% |
15 |
26scoring%3Dd |
16 |
|
17 |
or search for "group:linux.kernel smp gentoo" on google groups, |
18 |
|
19 |
or see lkml thread: SMP + Hyperthreading / Asus PCDL Deluxe / Kernel 2.4.x |
20 |
2.6.x / Crash/Freeze). |
21 |
|
22 |
Instead, I think the most likely explanation for my problem is a bug in |
23 |
some Gentoo code somewhere, perhaps related to building kernels, but |
24 |
maybe not... Maybe related to building gcc itself? Not sure. |
25 |
|
26 |
In summary, my problem is this: of those that I've tried, I can't get any |
27 |
Gentoo kernel to handle SMP operation during major CPU activity (like |
28 |
emerging packages) for more than about 5 or 10 minutes. Invariably, |
29 |
during such activity, I get a kernel panic---most often with words on the |
30 |
console about Machine Check Exception 000000...004 (this number from |
31 |
memory so it may be off). |
32 |
|
33 |
The only way that I can get reliable, stable operation with a Gentoo |
34 |
kernel and distribution is if I build a kernel without support for SMP. |
35 |
This is stable with or without hyperthreading enabled in CMOS. Over the |
36 |
last week or so, I've tried running kernels with the CMOS setting for |
37 |
hyperthreading disabled and enabled, with support for SMP enabled and |
38 |
disabled, in all combinations and for the latest stable ebuilds of the |
39 |
following Gentoo kernels: vanilla-sources, gentoo-sources, |
40 |
gentoo-dev-sources, gs-sources (actually, I couldn't even get this to |
41 |
build---see bug #48973 and thread here: gs-sources problems: device |
42 |
mapper: dm.o has undeclared identifiers). Going by memory, I tried |
43 |
kernel versions 2.4.25 (vanilla?), 2.4.26 (gentoo?), and 2.6.5 |
44 |
(gentoo-dev?). |
45 |
|
46 |
In all of the above circumstances, when running a kernel with support for |
47 |
SMP and when emerging packages (some pretty small ones, so only about 3 |
48 |
or 5 minutes of compiling), the machine would lock with a kernel panic |
49 |
and need a hard reset. |
50 |
|
51 |
The machine is a Dell PowerEdge1600SC with a PERC-3/SC SCSI RAID |
52 |
controller (using AMI megaraid2 driver) and a LSI Logic Corp controller |
53 |
(using Fusion MPT base driver) for the SCSI DAT and with dual 2.4GHz Xeon |
54 |
processors, each having a 512KB L2 Cache. |
55 |
|
56 |
Output from /proc/cpuinfo is: |
57 |
======= |
58 |
processor : 0 |
59 |
vendor_id : GenuineIntel |
60 |
cpu family : 15 |
61 |
model : 2 |
62 |
model name : Intel(R) Xeon(TM) CPU 2.40GHz |
63 |
stepping : 7 |
64 |
cpu MHz : 2392.127 |
65 |
cache size : 512 KB |
66 |
fdiv_bug : no |
67 |
hlt_bug : no |
68 |
f00f_bug : no |
69 |
coma_bug : no |
70 |
fpu : yes |
71 |
fpu_exception : yes |
72 |
cpuid level : 2 |
73 |
wp : yes |
74 |
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge |
75 |
mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe cid |
76 |
bogomips : 4771.02 |
77 |
|
78 |
processor : 1 |
79 |
vendor_id : GenuineIntel |
80 |
cpu family : 15 |
81 |
model : 2 |
82 |
model name : Intel(R) Xeon(TM) CPU 2.40GHz |
83 |
stepping : 7 |
84 |
cpu MHz : 2392.127 |
85 |
cache size : 512 KB |
86 |
fdiv_bug : no |
87 |
hlt_bug : no |
88 |
f00f_bug : no |
89 |
coma_bug : no |
90 |
fpu : yes |
91 |
fpu_exception : yes |
92 |
cpuid level : 2 |
93 |
wp : yes |
94 |
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge |
95 |
mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe cid |
96 |
bogomips : 4771.02 |
97 |
|
98 |
processor : 2 |
99 |
vendor_id : GenuineIntel |
100 |
cpu family : 15 |
101 |
model : 2 |
102 |
model name : Intel(R) Xeon(TM) CPU 2.40GHz |
103 |
stepping : 9 |
104 |
cpu MHz : 2392.127 |
105 |
cache size : 512 KB |
106 |
fdiv_bug : no |
107 |
hlt_bug : no |
108 |
f00f_bug : no |
109 |
coma_bug : no |
110 |
fpu : yes |
111 |
fpu_exception : yes |
112 |
cpuid level : 2 |
113 |
wp : yes |
114 |
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge |
115 |
mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe cid |
116 |
bogomips : 4771.02 |
117 |
|
118 |
processor : 3 |
119 |
vendor_id : GenuineIntel |
120 |
cpu family : 15 |
121 |
model : 2 |
122 |
model name : Intel(R) Xeon(TM) CPU 2.40GHz |
123 |
stepping : 9 |
124 |
cpu MHz : 2392.127 |
125 |
cache size : 512 KB |
126 |
fdiv_bug : no |
127 |
hlt_bug : no |
128 |
f00f_bug : no |
129 |
coma_bug : no |
130 |
fpu : yes |
131 |
fpu_exception : yes |
132 |
cpuid level : 2 |
133 |
wp : yes |
134 |
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge |
135 |
mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe cid |
136 |
bogomips : 4771.02 |
137 |
======= |
138 |
|
139 |
I wrote some details about this problem in gentoo-user under the thread: |
140 |
2004.1 and SMP Problems, but since then, have done lots more testing. |
141 |
|
142 |
The reason that I think this is a Gentoo thing and not a kernel thing is |
143 |
that today I just finished installing SuSE9 on this same machine with |
144 |
CMOS hyperthreading setting enabled and the CPUs have been wailing away |
145 |
for hours doing simultaneous builds of several different source tarballs |
146 |
(bind9, kde3.2.2, mysql 4.0.18), and I haven't seen even a single |
147 |
problem. During these tests, I was running the SuSE kernel |
148 |
2.4.21-215-smp4G. |
149 |
|
150 |
In SuSE, the output of /proc/cpuinfo is close, but not exactly the same as |
151 |
above. There are some differences in the flags and a couple other things |
152 |
(use diff for specifics). |
153 |
|
154 |
SuSE /proc/cpuinfo: |
155 |
|
156 |
======= |
157 |
processor : 0 |
158 |
vendor_id : GenuineIntel |
159 |
cpu family : 15 |
160 |
model : 2 |
161 |
model name : Intel(R) Xeon(TM) CPU 2.40GHz |
162 |
stepping : 7 |
163 |
cpu MHz : 2392.795 |
164 |
cache size : 512 KB |
165 |
physical id : 0 |
166 |
siblings : 2 |
167 |
fdiv_bug : no |
168 |
hlt_bug : no |
169 |
f00f_bug : no |
170 |
coma_bug : no |
171 |
fpu : yes |
172 |
fpu_exception : yes |
173 |
cpuid level : 2 |
174 |
wp : yes |
175 |
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge |
176 |
mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm |
177 |
bogomips : 4718.59 |
178 |
|
179 |
processor : 1 |
180 |
vendor_id : GenuineIntel |
181 |
cpu family : 15 |
182 |
model : 2 |
183 |
model name : Intel(R) Xeon(TM) CPU 2.40GHz |
184 |
stepping : 7 |
185 |
cpu MHz : 2392.795 |
186 |
cache size : 512 KB |
187 |
physical id : 0 |
188 |
siblings : 2 |
189 |
fdiv_bug : no |
190 |
hlt_bug : no |
191 |
f00f_bug : no |
192 |
coma_bug : no |
193 |
fpu : yes |
194 |
fpu_exception : yes |
195 |
cpuid level : 2 |
196 |
wp : yes |
197 |
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge |
198 |
mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm |
199 |
bogomips : 4767.74 |
200 |
|
201 |
processor : 2 |
202 |
vendor_id : GenuineIntel |
203 |
cpu family : 15 |
204 |
model : 2 |
205 |
model name : Intel(R) Xeon(TM) CPU 2.40GHz |
206 |
stepping : 9 |
207 |
cpu MHz : 2392.795 |
208 |
cache size : 512 KB |
209 |
physical id : 2 |
210 |
siblings : 2 |
211 |
fdiv_bug : no |
212 |
hlt_bug : no |
213 |
f00f_bug : no |
214 |
coma_bug : no |
215 |
fpu : yes |
216 |
fpu_exception : yes |
217 |
cpuid level : 2 |
218 |
wp : yes |
219 |
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge |
220 |
mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm |
221 |
bogomips : 4767.74 |
222 |
|
223 |
processor : 3 |
224 |
vendor_id : GenuineIntel |
225 |
cpu family : 15 |
226 |
model : 2 |
227 |
model name : Intel(R) Xeon(TM) CPU 2.40GHz |
228 |
stepping : 9 |
229 |
cpu MHz : 2392.795 |
230 |
cache size : 512 KB |
231 |
physical id : 2 |
232 |
siblings : 2 |
233 |
fdiv_bug : no |
234 |
hlt_bug : no |
235 |
f00f_bug : no |
236 |
coma_bug : no |
237 |
fpu : yes |
238 |
fpu_exception : yes |
239 |
cpuid level : 2 |
240 |
wp : yes |
241 |
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge |
242 |
mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm |
243 |
bogomips : 4767.74 |
244 |
======= |
245 |
|
246 |
Gentoo emerge info output: |
247 |
======= |
248 |
System uname: 2.4.25-gentoo-r2 i686 Intel(R) Xeon(TM) CPU 2.40GHz |
249 |
Gentoo Base System version 1.4.9 |
250 |
distcc 2.13 i686-pc-linux-gnu (protocols 1 and 2) (default port 3632) |
251 |
[enabled] |
252 |
ccache version 2.3 [enabled] |
253 |
Autoconf: sys-devel/autoconf-2.58-r1 |
254 |
Automake: sys-devel/automake-1.8.3 |
255 |
ACCEPT_KEYWORDS="x86" |
256 |
AUTOCLEAN="yes" |
257 |
CFLAGS="-O3 -march=pentium4 -pipe -fomit-frame-pointer" |
258 |
CHOST="i686-pc-linux-gnu" |
259 |
COMPILER="gcc3" |
260 |
CONFIG_PROTECT="/etc /usr/X11R6/lib/X11/xkb /usr/kde/2/share/config /usr/kde/3.2/share/config /usr/kde/3/share/config /usr/lib/mozilla/defaults/pref /usr/share/config /usr/share/texmf/dvipdfm/config/ /usr/share/texmf/dvips/config/ /usr/share/texmf/tex/generic/config/ /usr/share/texmf/tex/platex/config/ /usr/share/texmf/xdvi/ /var/qmail/control" |
261 |
CONFIG_PROTECT_MASK="/etc/afs/C /etc/afs/afsws /etc/gconf /etc/terminfo /etc/env.d" |
262 |
CXXFLAGS="-O3 -march=pentium4 -pipe -fomit-frame-pointer" |
263 |
DISTDIR="/usr/portage/distfiles" |
264 |
FEATURES="autoaddcvs ccache distcc sandbox" |
265 |
GENTOO_MIRRORS="http://128.213.5.34/gentoo/ |
266 |
http://mirror.datapipe.net/gentoo |
267 |
ftp://mirrors.sec.informatik.tu-darmstadt.de/gentoo/ |
268 |
http://gentoo.eliteitminds.com" |
269 |
MAKEOPTS="-j3" |
270 |
PKGDIR="/usr/portage/packages" |
271 |
PORTAGE_TMPDIR="/var/tmp" |
272 |
PORTDIR="/usr/portage" |
273 |
PORTDIR_OVERLAY="" |
274 |
SYNC="rsync://rsync.namerica.gentoo.org/gentoo-portage" |
275 |
USE="X Xaw3d acl acpi afs alsa apache2 apm arts avi berkdb bonobo caps |
276 |
crypt |
277 |
cups doc emacs emacs-w3 encode esd ethereal evo firebird flac foomaticdb |
278 |
gdbm |
279 |
gif gnome gpm gstreamer gtk gtk2 gtkhtml guile hardened icq imagemagick |
280 |
imap |
281 |
imlib innodb ipv6 jabber jack java jikes jpeg kde kerberos krb4 ldap |
282 |
libg++ |
283 |
libwww mad mcal mikmod motif mozilla mpeg mysql ncurses nls odbc oggvorbis |
284 |
opengl oss pam pda pdflib perl plotutils png ppds prelude python qt |
285 |
quicktime |
286 |
readline ruby samba sasl sdl slang slp spell sse ssl svga tcltk tcpd tetex |
287 |
tiff |
288 |
truetype unicode usb vhosts x86 xinerama xml2 xmms xv zeo zlib" |
289 |
======= |
290 |
|
291 |
I'm a recent Gentoo convert. I think it's an excellent improvement on the |
292 |
traditional Linux distros, and I'd really like to use it on my server, |
293 |
but as long as this problem with SMP is present, I just can't. |
294 |
|
295 |
If anyone has any suggestions on what I might be doing wrong and how I can |
296 |
get a stable gentoo system with full support for SMP (ideally in a 2.4.x |
297 |
kernel since I need OpenAFS and OpenAFS doesn't work with 2.6.x right |
298 |
now; nor for the near future say the developers of OAFS), I would really |
299 |
appreciate getting your thoughts. |
300 |
|
301 |
Thanks in advance. |
302 |
|
303 |
-- |
304 |
-Kevin |
305 |
|
306 |
PS. FWIW, I'll add that I have a very vague memory while watching text fly |
307 |
up the screen during bootstrap.sh or emerge system (this was a stage 1 |
308 |
install before I installed SuSE over it) of seeing some warning about |
309 |
something being unsafe with SMP. Do I need to have some setting or other |
310 |
turned off for some parts of a stage 1 install with a dual CPU system? |
311 |
|
312 |
|
313 |
-- |
314 |
gentoo-dev@g.o mailing list |