1 |
Hello, |
2 |
|
3 |
I've just installed Gentoo on a dual-processor machine and now I'm running |
4 |
into the following problem - when I start emerge, it randomly stops and one |
5 |
of the following things happens: |
6 |
- the machine freezes completely so that I cannot switch to another |
7 |
console or do anything |
8 |
- if I already have multiple ssh sessions open, sometimes one of the |
9 |
sessions remains alive, but invoking any command freezes that session. Any |
10 |
attempt to kill a process has no effect. |
11 |
- soft lockup detected on at least one cpu. |
12 |
|
13 |
I'm running out of ideas what to try next, so I thought I would ask for |
14 |
help. |
15 |
Here is what I checked and tried so far: |
16 |
- configured and built the kernel with SMP and NUMA support. |
17 |
Triple-checked this. |
18 |
- both processors are detected and initialized at boot. ACPI is used for |
19 |
SMP configuration information |
20 |
- processor temperatures: CPU0 32 C, CPU1 31 C, system 39 C. System is |
21 |
located in a room with steady 20 C |
22 |
- disabled CPU#1using |
23 |
echo 0 > /sys/devices/system/cpu/cpu1/online |
24 |
In this case, everything seems to work fine. This is the only way to |
25 |
compile or emerge anything. |
26 |
- using MAKEOPTS="-j3". Tried with "-j2", but the same problem occurs. |
27 |
- checked if there are SMP specific USE flags, and the only one I could |
28 |
find was for gimp. |
29 |
- experimented with different preemption models. The problem occurs with |
30 |
all of them. |
31 |
- disabled APM and enabled ACPI 2.0 support. After I did this, I've got |
32 |
"kernel panic - killing interrupt handler ...." |
33 |
|
34 |
The system has 2 AMD Opeteron Processors 252, 5 disks - 1IDE Maxtor |
35 |
6B200R0 and 4 SCSI Maxtor 6L300S0, and probably irrelevant ATAPI 48X DVD-ROM |
36 |
DVD-R CD-R/RW drive, Ethernet controller: Broadcom Corporation NetXtreme |
37 |
BCM5703X Gigabit Ethernet (rev 02), RAID bus controller: Silicon Image, Inc. |
38 |
SiI 3114 [SATALink/SATARaid] Serial ATA Controller (rev 02), FireWire (IEEE |
39 |
1394): Texas Instruments TSB43AB22/A IEEE-1394a-2000 Controller (PHY/Link), |
40 |
VGA compatible controller: ATI Technologies Inc RV350 AP [Radeon 9600] |
41 |
05:00.1, Display controller: ATI Technologies Inc RV350 AP [Radeon 9600] |
42 |
(Secondary). |
43 |
Kernel version is 2.6.17 built using gentoo-sources. |
44 |
|
45 |
Any idea what might be causing this problem? Bad kernel configuration? Bad |
46 |
system configuration? Kernel bug? Portage bug? Defective processor? Problem |
47 |
with disk access? |
48 |
I'm including a snapshot of the info I could retrieve from the system when |
49 |
the system remained somewhat responsive after the problem occurred. |
50 |
|
51 |
Kind regards, |
52 |
Vesna |
53 |
|
54 |
|
55 |
odin ~ # uname -a |
56 |
Linux odin 2.6.17-gentoo-r8 #7 SMP PREEMPT Tue Oct 31 12:10:14 EST 2006 |
57 |
x86_64 AMD Opteron(tm) Processor 252 GNU/Linux |
58 |
|
59 |
top - 22:42:38 up 7:57, 3 users, load average: 7.99 , 7.71, 5.39 |
60 |
Tasks: 64 total, 8 running, 56 sleeping, 0 stopped, 0 zombie |
61 |
Cpu0 : 0.0% us, 100.0% sy, 0.0% ni, 0.0% id, 0.0% wa, 0.0% hi, 0.0%si |
62 |
Cpu1 : 0.0% us, 0.0% sy, 0.0% ni, 100.0% id, 0.0% wa, 0.0% hi, 0.0%si |
63 |
Mem: 6929848k total, 140192k used, 6789656k free, 15272k buffers |
64 |
Swap: 5004236k total, 0k used, 5004236k free, 65328k cached |
65 |
|
66 |
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ |
67 |
COMMAND |
68 |
9811 root 16 0 0 0 0 R 100 0.0 18:27.43emerge |
69 |
1 root 16 0 2608 572 488 S 0 0.0 0:00.46init |
70 |
2 root RT 0 0 0 0 R 0 0.0 |
71 |
0:00.00migration/0 |
72 |
3 root 34 19 0 0 0 S 0 0.0 |
73 |
0:00.00ksoftirqd/0 |
74 |
4 root RT 0 0 0 0 R 0 0.0 0:00.00watchdog/0 |
75 |
5 root RT 0 0 0 0 S 0 0.0 |
76 |
0:00.00migration/1 |
77 |
6 root 34 19 0 0 0 S 0 0.0 |
78 |
0:00.00ksoftirqd/1 |
79 |
7 root RT 0 0 0 0 S 0 0.0 0:00.00watchdog/1 |
80 |
8 root 10 -5 0 0 0 R 0 0.0 0:00.00events/0 |
81 |
9 root 10 -5 0 0 0 S 0 0.0 0:00.00events/1 |
82 |
10 root 19 -5 0 0 0 S 0 0.0 0:00.00khelper |
83 |
11 root 10 -5 0 0 0 S 0 0.0 0:00.00kthread |
84 |
16 root 10 -5 0 0 0 R 0 0.0 0:00.00kblockd/0 |
85 |
17 root 10 -5 0 0 0 S 0 0.0 0:00.00kblockd/1 |
86 |
18 root 14 -5 0 0 0 S 0 0.0 0:00.00kacpid |
87 |
103 root 10 -5 0 0 0 S 0 0.0 0:00.02kseriod |
88 |
166 root 20 0 0 0 0 S 0 0.0 0:00.00pdflush |
89 |
167 root 15 0 0 0 0 S 0 0.0 0:00.00pdflush |
90 |
168 root 18 0 0 0 0 S 0 0.0 0:00.00kswapd0 |
91 |
169 root 15 0 0 0 0 S 0 0.0 0:00.00kswapd1 |
92 |
170 root 14 -5 0 0 0 S 0 0.0 0:00.00aio/0 |
93 |
171 root 10 -5 0 0 0 S 0 0.0 0:00.00aio/1 |
94 |
|
95 |
|
96 |
top - 23:17:53 up 8:32, 3 users, load average: 11.99, 11.92, 10.81 |
97 |
Tasks: 65 total, 8 running, 57 sleeping, 0 stopped, 0 zombie |
98 |
Cpu0 : 0.0% us, 100.0% sy, 0.0% ni, 0.0% id, 0.0% wa, 0.0% hi, 0.0%si |
99 |
Cpu1 : 0.0% us, 0.0% sy, 0.0% ni, 0.0% id, 100.0% wa, 0.0% hi, 0.0%si |
100 |
Mem: 6929848k total, 142444k used, 6787404k free, 15300k buffers |
101 |
Swap: 5004236k total, 0k used, 5004236k free, 65560k cached |
102 |
|
103 |
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ |
104 |
COMMAND |
105 |
9811 root 16 0 0 0 0 R 100 0.0 53:43.13emerge |
106 |
1 root 16 0 2608 572 488 S 0 0.0 0:00.46init |
107 |
2 root RT 0 0 0 0 R 0 0.0 |
108 |
0:00.00migration/0 |
109 |
3 root 34 19 0 0 0 S 0 0.0 |
110 |
0:00.00ksoftirqd/0 |
111 |
4 root RT 0 0 0 0 R 0 0.0 0:00.00watchdog/0 |
112 |
5 root RT 0 0 0 0 S 0 0.0 |
113 |
0:00.00migration/1 |
114 |
6 root 34 19 0 0 0 S 0 0.0 |
115 |
0:00.00ksoftirqd/1 |
116 |
7 root RT 0 0 0 0 S 0 0.0 0:00.00watchdog/1 |
117 |
8 root 10 -5 0 0 0 R 0 0.0 0:00.00events/0 |
118 |
9 root 10 -5 0 0 0 S 0 0.0 0:00.00events/1 |
119 |
10 root 19 -5 0 0 0 S 0 0.0 0:00.00khelper |
120 |
11 root 10 -5 0 0 0 S 0 0.0 0:00.00kthread |
121 |
16 root 10 -5 0 0 0 R 0 0.0 0:00.00kblockd/0 |
122 |
17 root 10 -5 0 0 0 S 0 0.0 0:00.00kblockd/1 |
123 |
18 root 14 -5 0 0 0 S 0 0.0 0:00.00kacpid |
124 |
103 root 10 -5 0 0 0 S 0 0.0 0:00.02kseriod |
125 |
166 root 20 0 0 0 0 S 0 0.0 0:00.00pdflush |
126 |
167 root 15 0 0 0 0 D 0 0.0 0:00.00pdflush |
127 |
168 root 18 0 0 0 0 S 0 0.0 0:00.00kswapd0 |
128 |
169 root 15 0 0 0 0 S 0 0.0 0:00.00kswapd1 |
129 |
170 root 14 -5 0 0 0 S 0 0.0 0:00.00aio/0 |
130 |
171 root 10 -5 0 0 0 S 0 0.0 0:00.00 aio/1 |
131 |
|
132 |
|
133 |
F S UID PID PPID C PRI NI ADDR SZ WCHAN TTY TIME CMD |
134 |
4 S 0 1 0 0 76 0 - 652 - ? 00:00:00 init |
135 |
1 R 0 2 1 0 -40 - - 0 - ? 00:00:00 |
136 |
migration/0 |
137 |
1 S 0 3 1 0 94 19 - 0 ksofti ? 00:00:00 |
138 |
ksoftirqd/0 |
139 |
5 R 0 4 1 0 -40 - - 0 - ? 00:00:00 watchdog/0 |
140 |
|
141 |
1 S 0 5 1 0 -40 - - 0 migrat ? 00:00:00 |
142 |
migration/1 |
143 |
1 S 0 6 1 0 94 19 - 0 ksofti ? 00:00:00 |
144 |
ksoftirqd/1 |
145 |
5 S 0 7 1 0 -40 - - 0 watchd ? 00:00:00 watchdog/1 |
146 |
|
147 |
5 R 0 8 1 0 70 -5 - 0 - ? 00:00:00 events/0 |
148 |
1 S 0 9 1 0 70 -5 - 0 worker ? 00:00:00 events/1 |
149 |
1 S 0 10 1 0 79 -5 - 0 worker ? 00:00:00 khelper |
150 |
1 S 0 11 1 0 70 -5 - 0 worker ? 00:00:00 kthread |
151 |
1 R 0 16 11 0 70 -5 - 0 - ? 00:00:00 kblockd/0 |
152 |
1 S 0 17 11 0 70 -5 - 0 worker ? 00:00:00 kblockd/1 |
153 |
1 S 0 18 11 0 74 -5 - 0 worker ? 00:00:00 kacpid |
154 |
1 S 0 103 11 0 70 -5 - 0 serio_ ? 00:00:00 kseriod |
155 |
1 S 0 166 11 0 80 0 - 0 pdflus ? 00:00:00 pdflush |
156 |
1 S 0 167 11 0 75 0 - 0 pdflus ? 00:00:00 pdflush |
157 |
1 S 0 168 1 0 78 0 - 0 kswapd ? 00:00:00 kswapd0 |
158 |
1 S 0 169 1 0 75 0 - 0 kswapd ? 00:00:00 kswapd1 |
159 |
1 S 0 170 11 0 74 -5 - 0 worker ? 00:00:00 aio/0 |
160 |
1 S 0 171 11 0 70 -5 - 0 worker ? 00:00:00 aio/1 |
161 |
1 S 0 770 11 0 70 -5 - 0 worker ? 00:00:00 kpsmoused |
162 |
1 S 0 818 11 0 70 -5 - 0 worker ? 00:00:00 ata/0 |
163 |
1 S 0 819 11 0 71 -5 - 0 worker ? 00:00:00 ata/1 |
164 |
1 S 0 821 11 0 71 -5 - 0 scsi_e ? 00:00:00 scsi_eh_0 |
165 |
1 S 0 822 11 0 71 -5 - 0 scsi_e ? 00:00:00 scsi_eh_1 |
166 |
1 S 0 823 11 0 71 -5 - 0 scsi_e ? 00:00:00 scsi_eh_2 |
167 |
1 S 0 824 11 0 70 -5 - 0 scsi_e ? 00:00:00 scsi_eh_3 |
168 |
1 S 0 850 1 0 75 0 - 0 - ? 00:00:00 khpsbpkt |
169 |
1 S 0 854 1 0 76 0 - 0 - ? 00:00:00 |
170 |
knodemgrd_0 |
171 |
1 S 0 862 11 0 70 -5 - 0 kjourn ? 00:00:00 kjournald |
172 |
5 S 0 973 1 0 78 -4 - 1764 - ? 00:00:00 udevd |
173 |
1 S 0 2119 11 0 71 -5 - 0 kjourn ? 00:00:00 kjournald |
174 |
1 S 0 2123 11 0 71 -5 - 0 kjourn ? 00:00:00 kjournald |
175 |
1 S 0 2129 11 0 71 -5 - 0 kjourn ? 00:00:00 kjournald |
176 |
1 S 0 2134 11 0 71 -5 - 0 kjourn ? 00:00:00 kjournald |
177 |
1 S 0 2139 11 0 71 -5 - 0 kjourn ? 00:00:00 kjournald |
178 |
1 S 0 2144 11 0 70 -5 - 0 kjourn ? 00:00:00 kjournald |
179 |
1 S 0 2149 11 0 71 -5 - 0 kjourn ? 00:00:00 kjournald |
180 |
1 S 0 2157 11 0 72 -5 - 0 hub_th ? 00:00:00 khubd |
181 |
5 S 111 4246 1 0 76 0 - 2248 - ? 00:00:00 portmap |
182 |
5 S 0 4314 1 0 84 0 - 5305 - ? 00:00:00 ypbind |
183 |
5 S 65534 4384 1 0 84 0 - 1463 - ? 00:00:00 rpc.statd |
184 |
1 S 0 4391 11 0 71 -5 - 0 worker ? 00:00:00 rpciod/0 |
185 |
1 S 0 4392 11 0 71 -5 - 0 worker ? 00:00:00 rpciod/1 |
186 |
1 S 0 4393 1 0 85 0 - 0 - ? 00:00:00 lockd |
187 |
1 S 0 4394 1 0 76 0 - 1462 - ? 00:00:00 mount |
188 |
1 S 0 4453 1 0 76 0 - 1461 - ? 00:00:00 mount |
189 |
5 S 0 4514 1 0 76 0 - 4294 - ? 00:00:00 sshd |
190 |
0 S 0 4585 1 0 77 0 - 917 - tty1 00:00:00 agetty |
191 |
0 S 0 4586 1 0 76 0 - 917 - tty2 00:00:00 agetty |
192 |
0 S 0 4587 1 0 76 0 - 917 - tty3 00:00:00 agetty |
193 |
0 S 0 4588 1 0 76 0 - 917 - tty4 00:00:00 agetty |
194 |
0 S 0 4589 1 0 76 0 - 916 - tty5 00:00:00 agetty |
195 |
0 S 0 4590 1 0 76 0 - 916 - tty6 00:00:00 agetty |
196 |
4 S 0 14875 4514 0 75 0 - 7073 - ? 00:00:00 sshd |
197 |
4 S 0 14878 14875 0 75 0 - 2548 wait pts/0 00:00:00 bash |
198 |
0 D 0 26815 1 0 77 0 - 0 exit pts/0 00:00:00 cc1 |
199 |
4 R 0 9811 14878 86 76 0 - 0 - pts/0 00:02:59 emerge |
200 |
4 S 0 17651 4514 0 75 0 - 7036 - ? 00:00:00 sshd |
201 |
4 S 0 17654 17651 0 75 0 - 2547 wait pts/1 00:00:00 bash |
202 |
0 R 0 17661 17654 0 77 0 - 1019 - pts/1 00:00:00 ps |
203 |
|
204 |
|
205 |
odin ~ # cat /proc/cpuinfo |
206 |
processor : 0 |
207 |
vendor_id : AuthenticAMD |
208 |
cpu family : 15 |
209 |
model : 37 |
210 |
model name : AMD Opteron(tm) Processor 252 |
211 |
stepping : 1 |
212 |
cpu MHz : 2592.234 |
213 |
cache size : 1024 KB |
214 |
fpu : yes |
215 |
fpu_exception : yes |
216 |
cpuid level : 1 |
217 |
wp : yes |
218 |
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca |
219 |
cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx mmxext fxsr_opt lm |
220 |
3dnowext 3dnow pni lahf_lm |
221 |
bogomips : 5189.92 |
222 |
TLB size : 1024 4K pages |
223 |
fpu : yes |
224 |
fpu_exception : yes |
225 |
cpuid level : 1 |
226 |
wp : yes |
227 |
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca |
228 |
cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx mmxext fxsr_opt lm |
229 |
3dnowext 3dnow pni lahf_lm |
230 |
bogomips : 5189.92 |
231 |
TLB size : 1024 4K pages |
232 |
clflush size : 64 |
233 |
cache_alignment : 64 |
234 |
address sizes : 40 bits physical, 48 bits virtual |
235 |
power management: ts fid vid ttp |
236 |
|
237 |
processor : 1 |
238 |
vendor_id : AuthenticAMD |
239 |
cpu family : 15 |
240 |
model : 37 |
241 |
model name : AMD Opteron(tm) Processor 252 |
242 |
stepping : 1 |
243 |
cpu MHz : 2592.234 |
244 |
cache size : 1024 KB |
245 |
fpu : yes |
246 |
fpu_exception : yes |
247 |
cpuid level : 1 |
248 |
wp : yes |
249 |
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca |
250 |
cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx mmxext fxsr_opt lm |
251 |
3dnowext 3dnow pni lahf_lm |
252 |
bogomips : 5184.39 |
253 |
TLB size : 1024 4K pages |
254 |
clflush size : 64 |
255 |
cache_alignment : 64 |
256 |
address sizes : 40 bits physical, 48 bits virtual |
257 |
power management: ts fid vid ttp |