1 |
On 24 Nov, Stefan G. Weichinger wrote: |
2 |
> Stefan G. Weichinger schrieb: |
3 |
>> Stefan G. Weichinger schrieb: |
4 |
>> |
5 |
>>> Since then no crashes, but I would have to test clicking some more stuff |
6 |
>>> to really believe ... |
7 |
>> |
8 |
>> As always, after hitting SEND ... one more crash ... |
9 |
> |
10 |
> Sometimes it crashes after clicking opera, sometimes after clicking |
11 |
> thunderbird, so far never when clicking/starting a gnome-terminal. |
12 |
> |
13 |
> I am still looking for a pattern or an error-message somewhere ... |
14 |
> |
15 |
|
16 |
This reminds me of a problem we had just recently. |
17 |
Have you got a multi-core CPU ? |
18 |
If yes, read on. |
19 |
|
20 |
We have 6 machines here running an identical Gentoo system |
21 |
(just different hostname and IP number) |
22 |
with a AMD Phenom II quad core CPU and identical mother boards. |
23 |
One of them had these random crashes you reported. |
24 |
I've totured memory by running up to 3 memtester-processes |
25 |
over night - no single fault. Our dealer has replaced the motherboard - |
26 |
again no change. Then I suspected the CPU itself although it has stood |
27 |
a burnK7 run for several hours. |
28 |
|
29 |
After the CPU has been replaced the spook has gone. |
30 |
I suspect a cache coherence problem. The normal memory tests |
31 |
assign a given window of the physical storage to a given core - |
32 |
even if run in parallel. But a typical usage under Linux switches |
33 |
the core which executes a given thread quite frequently. |
34 |
Now the Phenom II has 4 core each with a private 0.5 Mb primary cache |
35 |
but a 6 Mb second level cache common to all 4 cores. |
36 |
In the BIOS one can opt for all 4 cores using this secondary cache |
37 |
or for only a single core using it. |
38 |
When a core writes to this cache or to memory all other cores must be |
39 |
informed that their private cache is invalid. If this doesn't happen or |
40 |
happens a bit too late, a core will fetch invalid (old) memory contents |
41 |
which may result in a crash. |
42 |
So, if you can, set the BIOS switch that only a single core |
43 |
can use the secondary cache. If the problems disappears |
44 |
the CPU is broken. |
45 |
|
46 |
I hope you can solve your problem, |
47 |
Helmut. |
48 |
|
49 |
-- |
50 |
Helmut Jarausch |
51 |
|
52 |
Lehrstuhl fuer Numerische Mathematik |
53 |
RWTH - Aachen University |
54 |
D 52056 Aachen, Germany |