Gentoo Archives: gentoo-amd64

From: Vesna Petrovic <vesna.petrovic@×××××.com>
To: gentoo-amd64@l.g.o
Subject: [gentoo-amd64] Problem with emerge on a dual-processor machine
Date: Tue, 31 Oct 2006 18:51:31
Message-Id: 60bedadc0610311046q23d77c5fu671c7330a2f09a14@mail.gmail.com
1 Hello,
2
3 I've just installed Gentoo on a dual-processor machine and now I'm running
4 into the following problem - when I start emerge, it randomly stops and one
5 of the following things happens:
6 - the machine freezes completely so that I cannot switch to another
7 console or do anything
8 - if I already have multiple ssh sessions open, sometimes one of the
9 sessions remains alive, but invoking any command freezes that session. Any
10 attempt to kill a process has no effect.
11 - soft lockup detected on at least one cpu.
12
13 I'm running out of ideas what to try next, so I thought I would ask for
14 help.
15 Here is what I checked and tried so far:
16 - configured and built the kernel with SMP and NUMA support.
17 Triple-checked this.
18 - both processors are detected and initialized at boot. ACPI is used for
19 SMP configuration information
20 - processor temperatures: CPU0 32 C, CPU1 31 C, system 39 C. System is
21 located in a room with steady 20 C
22 - disabled CPU#1using
23 echo 0 > /sys/devices/system/cpu/cpu1/online
24 In this case, everything seems to work fine. This is the only way to
25 compile or emerge anything.
26 - using MAKEOPTS="-j3". Tried with "-j2", but the same problem occurs.
27 - checked if there are SMP specific USE flags, and the only one I could
28 find was for gimp.
29 - experimented with different preemption models. The problem occurs with
30 all of them.
31 - disabled APM and enabled ACPI 2.0 support. After I did this, I've got
32 "kernel panic - killing interrupt handler ...."
33
34 The system has 2 AMD Opeteron Processors 252, 5 disks - 1IDE Maxtor
35 6B200R0 and 4 SCSI Maxtor 6L300S0, and probably irrelevant ATAPI 48X DVD-ROM
36 DVD-R CD-R/RW drive, Ethernet controller: Broadcom Corporation NetXtreme
37 BCM5703X Gigabit Ethernet (rev 02), RAID bus controller: Silicon Image, Inc.
38 SiI 3114 [SATALink/SATARaid] Serial ATA Controller (rev 02), FireWire (IEEE
39 1394): Texas Instruments TSB43AB22/A IEEE-1394a-2000 Controller (PHY/Link),
40 VGA compatible controller: ATI Technologies Inc RV350 AP [Radeon 9600]
41 05:00.1, Display controller: ATI Technologies Inc RV350 AP [Radeon 9600]
42 (Secondary).
43 Kernel version is 2.6.17 built using gentoo-sources.
44
45 Any idea what might be causing this problem? Bad kernel configuration? Bad
46 system configuration? Kernel bug? Portage bug? Defective processor? Problem
47 with disk access?
48 I'm including a snapshot of the info I could retrieve from the system when
49 the system remained somewhat responsive after the problem occurred.
50
51 Kind regards,
52 Vesna
53
54
55 odin ~ # uname -a
56 Linux odin 2.6.17-gentoo-r8 #7 SMP PREEMPT Tue Oct 31 12:10:14 EST 2006
57 x86_64 AMD Opteron(tm) Processor 252 GNU/Linux
58
59 top - 22:42:38 up 7:57, 3 users, load average: 7.99 , 7.71, 5.39
60 Tasks: 64 total, 8 running, 56 sleeping, 0 stopped, 0 zombie
61 Cpu0 : 0.0% us, 100.0% sy, 0.0% ni, 0.0% id, 0.0% wa, 0.0% hi, 0.0%si
62 Cpu1 : 0.0% us, 0.0% sy, 0.0% ni, 100.0% id, 0.0% wa, 0.0% hi, 0.0%si
63 Mem: 6929848k total, 140192k used, 6789656k free, 15272k buffers
64 Swap: 5004236k total, 0k used, 5004236k free, 65328k cached
65
66 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+
67 COMMAND
68 9811 root 16 0 0 0 0 R 100 0.0 18:27.43emerge
69 1 root 16 0 2608 572 488 S 0 0.0 0:00.46init
70 2 root RT 0 0 0 0 R 0 0.0
71 0:00.00migration/0
72 3 root 34 19 0 0 0 S 0 0.0
73 0:00.00ksoftirqd/0
74 4 root RT 0 0 0 0 R 0 0.0 0:00.00watchdog/0
75 5 root RT 0 0 0 0 S 0 0.0
76 0:00.00migration/1
77 6 root 34 19 0 0 0 S 0 0.0
78 0:00.00ksoftirqd/1
79 7 root RT 0 0 0 0 S 0 0.0 0:00.00watchdog/1
80 8 root 10 -5 0 0 0 R 0 0.0 0:00.00events/0
81 9 root 10 -5 0 0 0 S 0 0.0 0:00.00events/1
82 10 root 19 -5 0 0 0 S 0 0.0 0:00.00khelper
83 11 root 10 -5 0 0 0 S 0 0.0 0:00.00kthread
84 16 root 10 -5 0 0 0 R 0 0.0 0:00.00kblockd/0
85 17 root 10 -5 0 0 0 S 0 0.0 0:00.00kblockd/1
86 18 root 14 -5 0 0 0 S 0 0.0 0:00.00kacpid
87 103 root 10 -5 0 0 0 S 0 0.0 0:00.02kseriod
88 166 root 20 0 0 0 0 S 0 0.0 0:00.00pdflush
89 167 root 15 0 0 0 0 S 0 0.0 0:00.00pdflush
90 168 root 18 0 0 0 0 S 0 0.0 0:00.00kswapd0
91 169 root 15 0 0 0 0 S 0 0.0 0:00.00kswapd1
92 170 root 14 -5 0 0 0 S 0 0.0 0:00.00aio/0
93 171 root 10 -5 0 0 0 S 0 0.0 0:00.00aio/1
94
95
96 top - 23:17:53 up 8:32, 3 users, load average: 11.99, 11.92, 10.81
97 Tasks: 65 total, 8 running, 57 sleeping, 0 stopped, 0 zombie
98 Cpu0 : 0.0% us, 100.0% sy, 0.0% ni, 0.0% id, 0.0% wa, 0.0% hi, 0.0%si
99 Cpu1 : 0.0% us, 0.0% sy, 0.0% ni, 0.0% id, 100.0% wa, 0.0% hi, 0.0%si
100 Mem: 6929848k total, 142444k used, 6787404k free, 15300k buffers
101 Swap: 5004236k total, 0k used, 5004236k free, 65560k cached
102
103 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+
104 COMMAND
105 9811 root 16 0 0 0 0 R 100 0.0 53:43.13emerge
106 1 root 16 0 2608 572 488 S 0 0.0 0:00.46init
107 2 root RT 0 0 0 0 R 0 0.0
108 0:00.00migration/0
109 3 root 34 19 0 0 0 S 0 0.0
110 0:00.00ksoftirqd/0
111 4 root RT 0 0 0 0 R 0 0.0 0:00.00watchdog/0
112 5 root RT 0 0 0 0 S 0 0.0
113 0:00.00migration/1
114 6 root 34 19 0 0 0 S 0 0.0
115 0:00.00ksoftirqd/1
116 7 root RT 0 0 0 0 S 0 0.0 0:00.00watchdog/1
117 8 root 10 -5 0 0 0 R 0 0.0 0:00.00events/0
118 9 root 10 -5 0 0 0 S 0 0.0 0:00.00events/1
119 10 root 19 -5 0 0 0 S 0 0.0 0:00.00khelper
120 11 root 10 -5 0 0 0 S 0 0.0 0:00.00kthread
121 16 root 10 -5 0 0 0 R 0 0.0 0:00.00kblockd/0
122 17 root 10 -5 0 0 0 S 0 0.0 0:00.00kblockd/1
123 18 root 14 -5 0 0 0 S 0 0.0 0:00.00kacpid
124 103 root 10 -5 0 0 0 S 0 0.0 0:00.02kseriod
125 166 root 20 0 0 0 0 S 0 0.0 0:00.00pdflush
126 167 root 15 0 0 0 0 D 0 0.0 0:00.00pdflush
127 168 root 18 0 0 0 0 S 0 0.0 0:00.00kswapd0
128 169 root 15 0 0 0 0 S 0 0.0 0:00.00kswapd1
129 170 root 14 -5 0 0 0 S 0 0.0 0:00.00aio/0
130 171 root 10 -5 0 0 0 S 0 0.0 0:00.00 aio/1
131
132
133 F S UID PID PPID C PRI NI ADDR SZ WCHAN TTY TIME CMD
134 4 S 0 1 0 0 76 0 - 652 - ? 00:00:00 init
135 1 R 0 2 1 0 -40 - - 0 - ? 00:00:00
136 migration/0
137 1 S 0 3 1 0 94 19 - 0 ksofti ? 00:00:00
138 ksoftirqd/0
139 5 R 0 4 1 0 -40 - - 0 - ? 00:00:00 watchdog/0
140
141 1 S 0 5 1 0 -40 - - 0 migrat ? 00:00:00
142 migration/1
143 1 S 0 6 1 0 94 19 - 0 ksofti ? 00:00:00
144 ksoftirqd/1
145 5 S 0 7 1 0 -40 - - 0 watchd ? 00:00:00 watchdog/1
146
147 5 R 0 8 1 0 70 -5 - 0 - ? 00:00:00 events/0
148 1 S 0 9 1 0 70 -5 - 0 worker ? 00:00:00 events/1
149 1 S 0 10 1 0 79 -5 - 0 worker ? 00:00:00 khelper
150 1 S 0 11 1 0 70 -5 - 0 worker ? 00:00:00 kthread
151 1 R 0 16 11 0 70 -5 - 0 - ? 00:00:00 kblockd/0
152 1 S 0 17 11 0 70 -5 - 0 worker ? 00:00:00 kblockd/1
153 1 S 0 18 11 0 74 -5 - 0 worker ? 00:00:00 kacpid
154 1 S 0 103 11 0 70 -5 - 0 serio_ ? 00:00:00 kseriod
155 1 S 0 166 11 0 80 0 - 0 pdflus ? 00:00:00 pdflush
156 1 S 0 167 11 0 75 0 - 0 pdflus ? 00:00:00 pdflush
157 1 S 0 168 1 0 78 0 - 0 kswapd ? 00:00:00 kswapd0
158 1 S 0 169 1 0 75 0 - 0 kswapd ? 00:00:00 kswapd1
159 1 S 0 170 11 0 74 -5 - 0 worker ? 00:00:00 aio/0
160 1 S 0 171 11 0 70 -5 - 0 worker ? 00:00:00 aio/1
161 1 S 0 770 11 0 70 -5 - 0 worker ? 00:00:00 kpsmoused
162 1 S 0 818 11 0 70 -5 - 0 worker ? 00:00:00 ata/0
163 1 S 0 819 11 0 71 -5 - 0 worker ? 00:00:00 ata/1
164 1 S 0 821 11 0 71 -5 - 0 scsi_e ? 00:00:00 scsi_eh_0
165 1 S 0 822 11 0 71 -5 - 0 scsi_e ? 00:00:00 scsi_eh_1
166 1 S 0 823 11 0 71 -5 - 0 scsi_e ? 00:00:00 scsi_eh_2
167 1 S 0 824 11 0 70 -5 - 0 scsi_e ? 00:00:00 scsi_eh_3
168 1 S 0 850 1 0 75 0 - 0 - ? 00:00:00 khpsbpkt
169 1 S 0 854 1 0 76 0 - 0 - ? 00:00:00
170 knodemgrd_0
171 1 S 0 862 11 0 70 -5 - 0 kjourn ? 00:00:00 kjournald
172 5 S 0 973 1 0 78 -4 - 1764 - ? 00:00:00 udevd
173 1 S 0 2119 11 0 71 -5 - 0 kjourn ? 00:00:00 kjournald
174 1 S 0 2123 11 0 71 -5 - 0 kjourn ? 00:00:00 kjournald
175 1 S 0 2129 11 0 71 -5 - 0 kjourn ? 00:00:00 kjournald
176 1 S 0 2134 11 0 71 -5 - 0 kjourn ? 00:00:00 kjournald
177 1 S 0 2139 11 0 71 -5 - 0 kjourn ? 00:00:00 kjournald
178 1 S 0 2144 11 0 70 -5 - 0 kjourn ? 00:00:00 kjournald
179 1 S 0 2149 11 0 71 -5 - 0 kjourn ? 00:00:00 kjournald
180 1 S 0 2157 11 0 72 -5 - 0 hub_th ? 00:00:00 khubd
181 5 S 111 4246 1 0 76 0 - 2248 - ? 00:00:00 portmap
182 5 S 0 4314 1 0 84 0 - 5305 - ? 00:00:00 ypbind
183 5 S 65534 4384 1 0 84 0 - 1463 - ? 00:00:00 rpc.statd
184 1 S 0 4391 11 0 71 -5 - 0 worker ? 00:00:00 rpciod/0
185 1 S 0 4392 11 0 71 -5 - 0 worker ? 00:00:00 rpciod/1
186 1 S 0 4393 1 0 85 0 - 0 - ? 00:00:00 lockd
187 1 S 0 4394 1 0 76 0 - 1462 - ? 00:00:00 mount
188 1 S 0 4453 1 0 76 0 - 1461 - ? 00:00:00 mount
189 5 S 0 4514 1 0 76 0 - 4294 - ? 00:00:00 sshd
190 0 S 0 4585 1 0 77 0 - 917 - tty1 00:00:00 agetty
191 0 S 0 4586 1 0 76 0 - 917 - tty2 00:00:00 agetty
192 0 S 0 4587 1 0 76 0 - 917 - tty3 00:00:00 agetty
193 0 S 0 4588 1 0 76 0 - 917 - tty4 00:00:00 agetty
194 0 S 0 4589 1 0 76 0 - 916 - tty5 00:00:00 agetty
195 0 S 0 4590 1 0 76 0 - 916 - tty6 00:00:00 agetty
196 4 S 0 14875 4514 0 75 0 - 7073 - ? 00:00:00 sshd
197 4 S 0 14878 14875 0 75 0 - 2548 wait pts/0 00:00:00 bash
198 0 D 0 26815 1 0 77 0 - 0 exit pts/0 00:00:00 cc1
199 4 R 0 9811 14878 86 76 0 - 0 - pts/0 00:02:59 emerge
200 4 S 0 17651 4514 0 75 0 - 7036 - ? 00:00:00 sshd
201 4 S 0 17654 17651 0 75 0 - 2547 wait pts/1 00:00:00 bash
202 0 R 0 17661 17654 0 77 0 - 1019 - pts/1 00:00:00 ps
203
204
205 odin ~ # cat /proc/cpuinfo
206 processor : 0
207 vendor_id : AuthenticAMD
208 cpu family : 15
209 model : 37
210 model name : AMD Opteron(tm) Processor 252
211 stepping : 1
212 cpu MHz : 2592.234
213 cache size : 1024 KB
214 fpu : yes
215 fpu_exception : yes
216 cpuid level : 1
217 wp : yes
218 flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
219 cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx mmxext fxsr_opt lm
220 3dnowext 3dnow pni lahf_lm
221 bogomips : 5189.92
222 TLB size : 1024 4K pages
223 fpu : yes
224 fpu_exception : yes
225 cpuid level : 1
226 wp : yes
227 flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
228 cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx mmxext fxsr_opt lm
229 3dnowext 3dnow pni lahf_lm
230 bogomips : 5189.92
231 TLB size : 1024 4K pages
232 clflush size : 64
233 cache_alignment : 64
234 address sizes : 40 bits physical, 48 bits virtual
235 power management: ts fid vid ttp
236
237 processor : 1
238 vendor_id : AuthenticAMD
239 cpu family : 15
240 model : 37
241 model name : AMD Opteron(tm) Processor 252
242 stepping : 1
243 cpu MHz : 2592.234
244 cache size : 1024 KB
245 fpu : yes
246 fpu_exception : yes
247 cpuid level : 1
248 wp : yes
249 flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
250 cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx mmxext fxsr_opt lm
251 3dnowext 3dnow pni lahf_lm
252 bogomips : 5184.39
253 TLB size : 1024 4K pages
254 clflush size : 64
255 cache_alignment : 64
256 address sizes : 40 bits physical, 48 bits virtual
257 power management: ts fid vid ttp

Replies