1 |
Quoth Kevin (Tue 2004-05-11 02:07:58PM -0400): |
2 |
|
3 |
> In summary, my problem is this: of those that I've tried, I can't get any |
4 |
> Gentoo kernel to handle SMP operation during major CPU activity (like |
5 |
> emerging packages) for more than about 5 or 10 minutes. Invariably, |
6 |
> during such activity, I get a kernel panic---most often with words on the |
7 |
> console about Machine Check Exception 000000...004 (this number from |
8 |
> memory so it may be off). |
9 |
|
10 |
[...] |
11 |
|
12 |
> The only way that I can get reliable, stable operation with a Gentoo |
13 |
> kernel and distribution is if I build a kernel without support for SMP. |
14 |
|
15 |
[...] |
16 |
|
17 |
> The machine is a Dell PowerEdge1600SC with a PERC-3/SC SCSI RAID |
18 |
> controller (using AMI megaraid2 driver) and a LSI Logic Corp controller |
19 |
> (using Fusion MPT base driver) for the SCSI DAT and with dual 2.4GHz Xeon |
20 |
> processors, each having a 512KB L2 Cache. |
21 |
|
22 |
Running Gentoo with a 2.6.5 SMP kernel on a Dell PowerEdge 400SC: |
23 |
|
24 |
: jmglov@jglover; uname -a |
25 |
Linux jglover 2.6.5-gentoo-r1 #1 SMP Fri Apr 30 17:37:18 EDT 2004 i686 Intel(R) Pentium(R) 4 CPU 2.40GHz GenuineIntel GNU/Linux |
26 |
|
27 |
> Output from /proc/cpuinfo is: |
28 |
|
29 |
<snip> |
30 |
|
31 |
: jmglov@jglover; cat /proc/cpuinfo |
32 |
processor : 0 |
33 |
vendor_id : GenuineIntel |
34 |
cpu family : 15 |
35 |
model : 2 |
36 |
model name : Intel(R) Pentium(R) 4 CPU 2.40GHz |
37 |
stepping : 9 |
38 |
cpu MHz : 2395.027 |
39 |
cache size : 512 KB |
40 |
physical id : 0 |
41 |
siblings : 2 |
42 |
fdiv_bug : no |
43 |
hlt_bug : no |
44 |
f00f_bug : no |
45 |
coma_bug : no |
46 |
fpu : yes |
47 |
fpu_exception : yes |
48 |
cpuid level : 2 |
49 |
wp : yes |
50 |
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe cid |
51 |
bogomips : 4718.59 |
52 |
|
53 |
processor : 1 |
54 |
vendor_id : GenuineIntel |
55 |
cpu family : 15 |
56 |
model : 2 |
57 |
model name : Intel(R) Pentium(R) 4 CPU 2.40GHz |
58 |
stepping : 9 |
59 |
cpu MHz : 2395.027 |
60 |
cache size : 512 KB |
61 |
physical id : 0 |
62 |
siblings : 2 |
63 |
fdiv_bug : no |
64 |
hlt_bug : no |
65 |
f00f_bug : no |
66 |
coma_bug : no |
67 |
fpu : yes |
68 |
fpu_exception : yes |
69 |
cpuid level : 2 |
70 |
wp : yes |
71 |
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe cid |
72 |
bogomips : 4767.74 |
73 |
|
74 |
> I'm a recent Gentoo convert. I think it's an excellent improvement on the |
75 |
> traditional Linux distros, and I'd really like to use it on my server, |
76 |
> but as long as this problem with SMP is present, I just can't. |
77 |
|
78 |
I really do not think it is a Gentoo issue. I have run Gentoo on quite a |
79 |
few SMP boxen over the past several years, and never had problems like |
80 |
you describe. Sounds like a hardware issue to me, unless you are using |
81 |
(or were using) some really bogus CFLAGS. |
82 |
|
83 |
> PS. FWIW, I'll add that I have a very vague memory while watching text fly |
84 |
> up the screen during bootstrap.sh or emerge system (this was a stage 1 |
85 |
> install before I installed SuSE over it) of seeing some warning about |
86 |
> something being unsafe with SMP. Do I need to have some setting or other |
87 |
> turned off for some parts of a stage 1 install with a dual CPU system? |
88 |
|
89 |
Nope. |
90 |
|
91 |
Quoth Kevin (Tue, 11 May 2004 15:38:35 -0400): |
92 |
|
93 |
> Ok. Thanks for the suggestion. But what about this: Dell has a utility |
94 |
> partition and some programs for doing exhaustive testing of all the |
95 |
> hardware in the server. If I run the most thorough set of tests |
96 |
> available in this utility partition and I get a clean bill of health, |
97 |
> is that a reliable indication that there are no hardware problems? |
98 |
|
99 |
Nope. Tragically, it usually works the other way around: hardware test |
100 |
suites are unlikely to give you a false positive, but if your hardware |
101 |
passes, that does not mean you are safe. Your issue might be heat- |
102 |
related, and your CPUs have to heat up for quite some time before they |
103 |
choke. Combine this with some other issue (maybe you optimised a bit |
104 |
aggressively when building your kernel?), and you have a tricky issue |
105 |
for a hardware tester to catch. |
106 |
|
107 |
Quoth Kevin (Tue, 11 May 2004 17:31:32 -0400): |
108 |
|
109 |
> Honestly, I'm thinking that I may have somehow built some software |
110 |
> (during the stage 1 installation process) that is causing these |
111 |
> problems, but I followed the Gentoo Handbook for doing a stage 1 |
112 |
> installation pretty rigidly, so I'm not sure what I might have done to |
113 |
> cause that. |
114 |
|
115 |
Why did you do a Stage 1, just out of curiousity. I recommend doing at |
116 |
least one Stage 1 install for newcomers to Gentoo, just for educational |
117 |
purposes, but after that, go Stage 3 and use as many binary packages as |
118 |
you can! There exists a Stage 3 tarball for your architecture--the |
119 |
Pentium4 one, so why not use that, just to make sure your base system |
120 |
is solid? |
121 |
|
122 |
> When I did the bootstrap.sh and emerge system, I was |
123 |
> running the kernel that I booted from the boot CD (2004.0 I think, and |
124 |
> probably even the smp kernel that was on that CD---IIRC, the 2004.1 |
125 |
> boot CD has some problems that prevent the use of the smp kernel on |
126 |
> that CD). |
127 |
|
128 |
I don't remember that, but I cannot say for certain that I have tried |
129 |
the 2004.1 universal x86 CD with the SMP kernel. |
130 |
|
131 |
> Are there some compiler flags or other configurable settings that, if |
132 |
> set to certain values during the bootstrap.sh or emerge system steps, |
133 |
> could end up generating software (perhaps when I built my own gcc?) |
134 |
> that would cause these MCEs to be thrown? |
135 |
|
136 |
I dunno, why don't you post your CFLAGS and MAKEOPTS from your make.conf |
137 |
here? |
138 |
|
139 |
Quoth Kevin (Tue, 11 May 2004 17:37:39 -0400): |
140 |
|
141 |
> On Tuesday 11 May 2004 15:38, Paul de Vrieze wrote: |
142 |
> |
143 |
>> What if you take the kernel from SUSE, |
144 |
> |
145 |
> I haven't tried installing Gentoo with my SuSE kernel running. Huh... |
146 |
> what a concept. With all the modularity of those default distro |
147 |
> kernels, would that even work? Maybe I'd need the kernel, the |
148 |
> System.map, and the /lib/modules/`uname -r` directory? |
149 |
|
150 |
Yes, you can install Gentoo while running *any* kernel. As long as you |
151 |
can chroot, you can install Gentoo. See my Faketoo for an example: |
152 |
|
153 |
http://forums.gentoo.org/viewtopic.php?p=1082580 |
154 |
|
155 |
Note that I do not actually build a kernel and setup the bootloader and |
156 |
so forth, since I do not need to boot my jailed Gentoo installation--it |
157 |
is just for ebuild development. However, nothing is stopping *you* from |
158 |
doing it. :) |
159 |
|
160 |
-- |
161 |
Josh Glover |
162 |
|
163 |
GPG keyID 0xDE8A3103 (C3E4 FA9E 1E07 BBDB 6D8B 07AB 2BF1 67A1 DE8A 3103) |
164 |
gpg --keyserver pgp.mit.edu --recv-keys DE8A3103 |