Gentoo Archives: gentoo-sparc

From: Ferris McCormick <fmccor@g.o>
To: gentoo-sparc@l.g.o
Subject: [gentoo-sparc] crashme crashes U60(2x300) almost as quickly as it does (2x450) (fwd)
Date: Sat, 22 Oct 2005 14:10:21
Message-Id: Pine.LNX.4.64.0510221404420.16076@terciopelo.krait.us
1 -----BEGIN PGP SIGNED MESSAGE-----
2 Hash: SHA1
3
4 Recalling IRC discussions, I guess this is of general interest.
5 If you haven't seen it before, read the second note first. :)
6
7 Truth in publishing ethics compels me to note that I have made a
8 correction to the U2 failure report.
9
10 If you have no idea what this is about, or if you have seen it many times
11 already, just ignore it.
12
13 Regards,
14 Ferris
15
16 - --
17 Ferris McCormick (P44646, MI) <fmccor@g.o>
18 Developer, Gentoo Linux (sparc, devrel)
19
20 Date: Sat, 22 Oct 2005 08:53:55 +0000 (UTC)
21 From: Ferris McCormick <fmccor@g.o>
22 To: squash@g.o, weeve@g.o
23 Cc: sparc@g.o
24 Subject: crashme crashes U60(2x300) almost as quickly as it does (2x450) (fwd)
25
26 - --[PinePGP]--------------------------------------------------[begin]--
27 So, to finish the story duplicated below:
28 1. Disk involved (/dev/sda) in this test is a standard SUN-branded
29 18GB disk, Vendor: SEAGATE Model: ST318203LSUN18G Rev: 034A;
30 second disk on the system is the same.
31 2. To summarize my crashme results with this kernel:
32 a. U60(2x300), U60(2x450) --- pretty much the same, as described
33 in the original note, duplicated below.
34 b. U2(2x400) --- much worse. This system could not make it through
35 the first untar in pass 1.
36 3. Problem is scsi disk I/O. I suspect increased CPU utilization might
37 make it less likely to show up, because if the CPUs are busy doing
38 other things, they can't hit the disk as hard (observation from
39 emerge --sync) --- this is speculation.
40 4. For the record, U2(2x400), U60(2x450) are both completely stable
41 under kernel 2.4.31-sparc-r2; actually, U2 perhaps moreso.
42
43 This raises a question: Jason stated that a SUNESP patch made his U2
44 do much better. Is this patch in kernel 2.6.14-rc3-gb4d1b825? If
45 not, I would like to apply it and retest U2(2x400) on Monday. Clearly,
46 it would simplify the situation if case 2(b) -- the U2 failure -- could be
47 eliminated. A sample size of 1 is not all that useful, but if I recall
48 correctly (and I might be rewriting history based on current status), for
49 me the problem on a running system first came to light on that U2; it
50 seems to me, at least, that the U2 is more prone to failure.
51
52 So, if there is a U2-specific patch which is not in the kernel, that would
53 be significant. We might be looking at 2 scsi-related problems which
54 result in the same symptom. Answering that seems to me to be important.
55
56 Sorry (not very, really) to include another copy of my first note.
57
58 Thoughts, comments, suggestions, etc. to list please, not to me
59 personally.
60
61 Regards,
62
63 - --
64 Ferris McCormick (P44646, MI) <fmccor@g.o>
65 Developer, Gentoo Linux (sparc, devrel)
66
67 - ---------- Forwarded message ----------
68 Date: Sat, 22 Oct 2005 01:38:38 +0000 (UTC)
69 From: Ferris McCormick <fmccor@g.o>
70 To: squash@g.o, weeve@g.o
71 Cc: sparc@g.o
72 Subject: crashme crashes U60(2x300) almost as quickly as it does (2x450)
73
74 I ran crashme on this system (as identified by 'uname -a') Friday evening:
75
76 Linux fer-de-lance 2.6.14-rc3-git-gb4d1b825 #1 SMP Fri Oct 21 23:20:37 UTC
77 2005 sparc64 sun4u TI UltraSparc II (BlackBird) GNU/Linux
78
79 gb4d1b825 is davem's current git.
80
81 This is a U60(2x300), /proc/cpuinfo thus:
82 ==============================
83 fmccor@fer-de-lance ~ $ cat /proc/cpuinfo
84 cpu : TI UltraSparc II (BlackBird)
85 fpu : UltraSparc II integrated FPU
86 promlib : Version 3 Revision 31
87 prom : 3.31.0
88 type : sun4u
89 ncpus probed : 2
90 ncpus active : 2
91 D$ parity tl1 : 0
92 I$ parity tl1 : 0
93 Cpu0Bogo : 589.82
94 Cpu0ClkTck : 0000000011a53054
95 Cpu2Bogo : 589.82
96 Cpu2ClkTck : 0000000011a53054
97 MMU Type : Spitfire
98 State:
99 CPU0: online
100 CPU2: online
101 =================================
102 On this system, crashme died beginning pass 4 (as opposed to pass 3 on
103 2x450). I modified crashme.sh to keep a log file; here it is.
104
105 =================================
106 Fri Oct 21 23:41:32 UTC 2005
107 2.6.14-rc3-git-gb4d1b825
108 Copying /usr/portage to /CRASH/crash.
109 Create tarfile
110 Removing portage
111 Untar
112 Removing portage
113 Run 1 completed
114 Copying /usr/portage to /CRASH/crash.
115 Create tarfile
116 Removing portage
117 Untar
118 Removing portage
119 Run 2 completed
120 Copying /usr/portage to /CRASH/crash.
121 Create tarfile
122 Removing portage
123 Untar
124 Removing portage
125 Run 3 completed
126 Copying /usr/portage to /CRASH/crash.
127 ====================================
128
129 The log does not show it, but /usr/portage and /CRASH/crash are on
130 the same partition (/dev/sda4).
131
132 So, crashme will kill (some) (2x300) systems if they are sensitive to the
133 problem. However, fer-de-lance (2x300) is much more robust running an
134 'emerge --sync' than antaresia (2x450) is. That might be because the CPUs
135 are slower, and so can't drive the disks as hard.
136
137 Hope this is useful,
138 Regards,
139 Ferris
140
141 - --
142 Ferris McCormick (P44646, MI) <fmccor@g.o>
143 Developer, Gentoo Linux (sparc, devrel)
144
145 -----BEGIN PGP SIGNATURE-----
146 Version: GnuPG v1.4.1 (GNU/Linux)
147
148 iD8DBQFDWkg8Qa6M3+I///cRAkODAKCIVOZdWsa0rLFh+P13uy6j3VO5NQCbBs3t
149 NO5RIaCds27WpDuxpFhyUh4=
150 =qOKp
151 -----END PGP SIGNATURE-----
152 --
153 gentoo-sparc@g.o mailing list

Replies