1 |
-----BEGIN PGP SIGNED MESSAGE----- |
2 |
Hash: SHA1 |
3 |
|
4 |
Recalling IRC discussions, I guess this is of general interest. |
5 |
If you haven't seen it before, read the second note first. :) |
6 |
|
7 |
Truth in publishing ethics compels me to note that I have made a |
8 |
correction to the U2 failure report. |
9 |
|
10 |
If you have no idea what this is about, or if you have seen it many times |
11 |
already, just ignore it. |
12 |
|
13 |
Regards, |
14 |
Ferris |
15 |
|
16 |
- -- |
17 |
Ferris McCormick (P44646, MI) <fmccor@g.o> |
18 |
Developer, Gentoo Linux (sparc, devrel) |
19 |
|
20 |
Date: Sat, 22 Oct 2005 08:53:55 +0000 (UTC) |
21 |
From: Ferris McCormick <fmccor@g.o> |
22 |
To: squash@g.o, weeve@g.o |
23 |
Cc: sparc@g.o |
24 |
Subject: crashme crashes U60(2x300) almost as quickly as it does (2x450) (fwd) |
25 |
|
26 |
- --[PinePGP]--------------------------------------------------[begin]-- |
27 |
So, to finish the story duplicated below: |
28 |
1. Disk involved (/dev/sda) in this test is a standard SUN-branded |
29 |
18GB disk, Vendor: SEAGATE Model: ST318203LSUN18G Rev: 034A; |
30 |
second disk on the system is the same. |
31 |
2. To summarize my crashme results with this kernel: |
32 |
a. U60(2x300), U60(2x450) --- pretty much the same, as described |
33 |
in the original note, duplicated below. |
34 |
b. U2(2x400) --- much worse. This system could not make it through |
35 |
the first untar in pass 1. |
36 |
3. Problem is scsi disk I/O. I suspect increased CPU utilization might |
37 |
make it less likely to show up, because if the CPUs are busy doing |
38 |
other things, they can't hit the disk as hard (observation from |
39 |
emerge --sync) --- this is speculation. |
40 |
4. For the record, U2(2x400), U60(2x450) are both completely stable |
41 |
under kernel 2.4.31-sparc-r2; actually, U2 perhaps moreso. |
42 |
|
43 |
This raises a question: Jason stated that a SUNESP patch made his U2 |
44 |
do much better. Is this patch in kernel 2.6.14-rc3-gb4d1b825? If |
45 |
not, I would like to apply it and retest U2(2x400) on Monday. Clearly, |
46 |
it would simplify the situation if case 2(b) -- the U2 failure -- could be |
47 |
eliminated. A sample size of 1 is not all that useful, but if I recall |
48 |
correctly (and I might be rewriting history based on current status), for |
49 |
me the problem on a running system first came to light on that U2; it |
50 |
seems to me, at least, that the U2 is more prone to failure. |
51 |
|
52 |
So, if there is a U2-specific patch which is not in the kernel, that would |
53 |
be significant. We might be looking at 2 scsi-related problems which |
54 |
result in the same symptom. Answering that seems to me to be important. |
55 |
|
56 |
Sorry (not very, really) to include another copy of my first note. |
57 |
|
58 |
Thoughts, comments, suggestions, etc. to list please, not to me |
59 |
personally. |
60 |
|
61 |
Regards, |
62 |
|
63 |
- -- |
64 |
Ferris McCormick (P44646, MI) <fmccor@g.o> |
65 |
Developer, Gentoo Linux (sparc, devrel) |
66 |
|
67 |
- ---------- Forwarded message ---------- |
68 |
Date: Sat, 22 Oct 2005 01:38:38 +0000 (UTC) |
69 |
From: Ferris McCormick <fmccor@g.o> |
70 |
To: squash@g.o, weeve@g.o |
71 |
Cc: sparc@g.o |
72 |
Subject: crashme crashes U60(2x300) almost as quickly as it does (2x450) |
73 |
|
74 |
I ran crashme on this system (as identified by 'uname -a') Friday evening: |
75 |
|
76 |
Linux fer-de-lance 2.6.14-rc3-git-gb4d1b825 #1 SMP Fri Oct 21 23:20:37 UTC |
77 |
2005 sparc64 sun4u TI UltraSparc II (BlackBird) GNU/Linux |
78 |
|
79 |
gb4d1b825 is davem's current git. |
80 |
|
81 |
This is a U60(2x300), /proc/cpuinfo thus: |
82 |
============================== |
83 |
fmccor@fer-de-lance ~ $ cat /proc/cpuinfo |
84 |
cpu : TI UltraSparc II (BlackBird) |
85 |
fpu : UltraSparc II integrated FPU |
86 |
promlib : Version 3 Revision 31 |
87 |
prom : 3.31.0 |
88 |
type : sun4u |
89 |
ncpus probed : 2 |
90 |
ncpus active : 2 |
91 |
D$ parity tl1 : 0 |
92 |
I$ parity tl1 : 0 |
93 |
Cpu0Bogo : 589.82 |
94 |
Cpu0ClkTck : 0000000011a53054 |
95 |
Cpu2Bogo : 589.82 |
96 |
Cpu2ClkTck : 0000000011a53054 |
97 |
MMU Type : Spitfire |
98 |
State: |
99 |
CPU0: online |
100 |
CPU2: online |
101 |
================================= |
102 |
On this system, crashme died beginning pass 4 (as opposed to pass 3 on |
103 |
2x450). I modified crashme.sh to keep a log file; here it is. |
104 |
|
105 |
================================= |
106 |
Fri Oct 21 23:41:32 UTC 2005 |
107 |
2.6.14-rc3-git-gb4d1b825 |
108 |
Copying /usr/portage to /CRASH/crash. |
109 |
Create tarfile |
110 |
Removing portage |
111 |
Untar |
112 |
Removing portage |
113 |
Run 1 completed |
114 |
Copying /usr/portage to /CRASH/crash. |
115 |
Create tarfile |
116 |
Removing portage |
117 |
Untar |
118 |
Removing portage |
119 |
Run 2 completed |
120 |
Copying /usr/portage to /CRASH/crash. |
121 |
Create tarfile |
122 |
Removing portage |
123 |
Untar |
124 |
Removing portage |
125 |
Run 3 completed |
126 |
Copying /usr/portage to /CRASH/crash. |
127 |
==================================== |
128 |
|
129 |
The log does not show it, but /usr/portage and /CRASH/crash are on |
130 |
the same partition (/dev/sda4). |
131 |
|
132 |
So, crashme will kill (some) (2x300) systems if they are sensitive to the |
133 |
problem. However, fer-de-lance (2x300) is much more robust running an |
134 |
'emerge --sync' than antaresia (2x450) is. That might be because the CPUs |
135 |
are slower, and so can't drive the disks as hard. |
136 |
|
137 |
Hope this is useful, |
138 |
Regards, |
139 |
Ferris |
140 |
|
141 |
- -- |
142 |
Ferris McCormick (P44646, MI) <fmccor@g.o> |
143 |
Developer, Gentoo Linux (sparc, devrel) |
144 |
|
145 |
-----BEGIN PGP SIGNATURE----- |
146 |
Version: GnuPG v1.4.1 (GNU/Linux) |
147 |
|
148 |
iD8DBQFDWkg8Qa6M3+I///cRAkODAKCIVOZdWsa0rLFh+P13uy6j3VO5NQCbBs3t |
149 |
NO5RIaCds27WpDuxpFhyUh4= |
150 |
=qOKp |
151 |
-----END PGP SIGNATURE----- |
152 |
-- |
153 |
gentoo-sparc@g.o mailing list |