Gentoo Archives: gentoo-user

From: felix@×××××××.com
To: gentoo-user@l.g.o
Subject: [gentoo-user] 3.7.1 SATA errors
Date: Sun, 23 Dec 2012 19:25:31
Message-Id: 20121223192335.GC5230@crowfix.com
1 A few weeks ago I had a scare when a reboot paniced the kernel with a complaint that it could not find the root device (/dev/sde), and further reboots couldn't even see the USB keyboard. Leavng the system powered off overnight "fixed" the problem and the system has been working fine ever since.
2
3 I have since had some time to explore this and find it related to the kernel; 3.6.10 works fine, while 3.7.1 fails. If I reset during the 3.7.1 boot while it is spewing its error messages, but before the kernel ultimately panics, I can reboot with 3.6.10, but if 3.7.1 goes all the way to the panic, I have to power off and wait a few minutes before a 3.6.10 reboot is succesful. This is repeatable, but I haven't bothered to see how long the system must be off; "a few minutes" is enough.
4
5 This is a ~amd64 system, dual Opterons, Tyan S2882, Thunder K8S Pro. The dmesg times here start around 30 seconds because it spends 15 seconds on each of two SCSI hosts probing for nonexistent drives. udev etc are all frozen pre-systemd nonsense. Disks are two SSDs, two 4T drives, two 300G drives, and one 320G IDE/PATA drive; the main board is so old that there are only three boot options: IDE, DVD, network.
6
7 There are two error messages during the 3.7.1 boot, repeated for all SATA drives:
8
9 ata5.00: qc timeout (cmd 0x2f)
10 ata5.00: failed to set xfermode (err_mask=0x40)
11
12 Google does not enlighten me. One suggestion was change the SATA cable, but this is definitely a change from 3.6.10 to 3.7.1.
13
14 So here are some details ... You can see everything at https://www.dropbox.com/sh/o8j80rps3agvvcf/FBjJLcykRS
15
16 I am willing to try reasonable config changes for a new reboot attempt, but it is my main home server, not an experimental toy :-)
17
18 ================ dmesg differences
19
20 I took some pictures during the boot process and transcribed the results. The 3.6.10 dmesg matches, but of course I can't get a 3.7.1 dmesg.
21
22 Both 3.6.10 and 3.7.1 appear to be the same up to this point:
23
24 ata13.00: ATA-8: WDC WD3200AAJB-00J3A0, 01.03E01, max UDMA/133
25 ata13.00: 625142448 sectors, multi 16: LBA48
26 ata13.00: configured for UDMA/133
27 ata1: SATA link down (SStatus 0 SControl 300)
28 ata9: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
29 ata9.00: ATA-9: M4-CT512M4SD2, 000F, max UDMA/100
30 ata9.00: 1000215216 sectors, multi 16: LBA48 NCQ (depth 0/32)
31 ata9.00: configured for UDMA/100
32 ata2: SATA link down (SStatus 0 SControl 300)
33 ata3: SATA link down (SStatus 0 SControl 300)
34 ata4: SATA link down (SStatus 0 SControl 300)
35 ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
36 ata5.00: ATA-7: Maxtor 6B300S0, BANC17M0, max UDMA/133
37 ata5.00: 586114704 sectors, multi 0: LBA48 NCQ (not used)
38
39 Around here 3.6.10 begins scrolling so fast that I could not get any pictures, so this is from the 3.6.10 dmesg, where it diverges from 3.7.1:
40
41 ata5.00: configured for UDMA/133
42 scsi 6:0:0:0: Direct-Access ATA Maxtor 6B300S0 BANC PQ: 0 ANSI: 5
43 sd 6:0:0:0: [sda] 586114704 512-byte logical blocks: (300 GB/279 GiB)
44 sd 6:0:0:0: [sda] Write Protect is off
45 sd 6:0:0:0: [sda] Mode Sense: 00 3a 00 00
46 sd 6:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
47 sda:
48 sd 6:0:0:0: [sda] Attached SCSI disk
49 ata6: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
50 ata6.00: ATA-7: Maxtor 6B300S0, BANC17M0, max UDMA/133
51 ata6.00: 586114704 sectors, multi 0: LBA48 NCQ (not used)
52 ata6.00: configured for UDMA/133
53 scsi 7:0:0:0: Direct-Access ATA Maxtor 6B300S0 BANC PQ: 0 ANSI: 5
54 sd 7:0:0:0: [sdb] 586114704 512-byte logical blocks: (300 GB/279 GiB)
55 sd 7:0:0:0: [sdb] Write Protect is off
56 sd 7:0:0:0: [sdb] Mode Sense: 00 3a 00 00
57 sd 7:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
58 sdb: unknown partition table
59 sd 7:0:0:0: [sdb] Attached SCSI disk
60 .... and on and on until it boots. (The unknown partition table is an LVM volume.)
61
62 But 3.7.1 pokes along slowly enough while generating its errors that I did get some pictures to transcribe, and this is where it diverges from 3.6.10.
63
64 ata5.00: qc timeout (cmd 0x2f)
65 ata5.00: failed to set xfermode (err_mask=0x40)
66 ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
67 ata5.00: qc timeout (cmd 0x2f)
68 ata5.00: failed to set xfermode (err_mask=0x40)
69 ata5: limiting SATA link speed to 1.5 Gbps
70 ata5.00: limiting speed to UDMA/133:PIO3
71 ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
72 ata5.00: qc timeout (cmd 0x2f)
73 ata5.00: failed to set xfermode (err_mask=0x40)
74 ata5.00: disabled
75 ata5: hard resetting link
76 ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
77 ata5: EH complete
78 ... for all ATA drives until it eventually panics because the root device, /dev/sde, is not found.
79
80
81 ================ 3.6.10 ---> 3.7.1 conf changes
82
83 I rebuilt the 3.7.1 kernel and logged all the new config items.
84
85 Cputime accounting
86 > 1. Simple tick based cputime accounting (TICK_CPU_ACCOUNTING) (NEW)
87 2. Fine granularity task level IRQ time accounting (IRQ_TIME_ACCOUNTING)
88 choice[1-2]:
89
90 Consider userspace as in RCU extended quiescent state (RCU_USER_QS) [N/y/?] (NEW)
91
92 Module signature verification (MODULE_SIG) [N/y/?] (NEW)
93
94 Supervisor Mode Access Prevention (X86_SMAP) [Y/n/?] (NEW) n
95
96 Legacy cpb sysfs knob support for AMD CPUs (X86_ACPI_CPUFREQ_CPB) [Y/n/?] (NEW)
97
98 Enable core dump support (COREDUMP) [Y/n/?] (NEW)
99
100 Packet: sockets monitoring interface (PACKET_DIAG) [N/m/y/?] (NEW) m
101
102 IPv4 NAT (NF_NAT_IPV4) [N/m/?] (NEW) m
103
104 OMAP OCP2SCP DRIVER (OMAP_OCP2SCP) [N/m/y/?] (NEW) m
105
106 Calxeda Highbank SATA support (SATA_HIGHBANK) [N/m/y/?] (NEW) m
107
108 Virtual eXtensible Local Area Network (VXLAN) (VXLAN) [N/m/y/?] (NEW) m
109
110 Solarflare SFC9000-family PTP support (SFC_PTP) [Y/n/?] (NEW)
111
112 Microchip MRF24J40 transceiver driver (IEEE802154_MRF24J40) [N/m/?] (NEW) m
113
114 8250/16550 PNP device support (SERIAL_8250_PNP) [Y/n/?] (NEW)
115
116 MAX310X support (SERIAL_MAX310X) [N/y/?] (NEW)
117
118 SCCNXP serial port support (SERIAL_SCCNXP) [N/m/y/?] (NEW) m
119
120 TPM HW Random Number Generator support (HW_RANDOM_TPM) [M/n/?] (NEW)
121
122 TPM Interface Specification 1.2 Interface (I2C - Infineon) (TCG_TIS_I2C_INFINEON) [N/m/?] (NEW) m
123
124 NXP SC18IS602/602B/603 I2C to SPI bridge (SPI_SC18IS602) [N/m/y/?] (NEW) m
125
126 Dialog DA9052 GPIO (GPIO_DA9052) [N/m/y/?] (NEW) m
127
128 TWL6040 GPO (GPIO_TWL6040) [N/m/y/?] (NEW) m
129
130 OMAP HDQ driver (HDQ_MASTER_OMAP) [N/m/?] (NEW) m
131
132 Marvell 88PM860x battery driver (BATTERY_88PM860X) [N/m/y/?] (NEW) m
133
134 Dialog DA9052 Battery (BATTERY_DA9052) [N/m/y/?] (NEW) m
135
136 Marvell 88PM860x Charger driver (CHARGER_88PM860X) [N/m/?] (NEW) m
137
138 Analog Devices ADT7410 (SENSORS_ADT7410) [N/m/?] (NEW) m
139
140 Maxim MAX197 and compatibles (SENSORS_MAX197) [N/m/?] (NEW) m
141
142 generic cpu cooling support (CPU_THERMAL) [N/y/?] (NEW)
143
144 Support for the SMSC ECE1099 series chips (MFD_SMSC) [N/y/?] (NEW)
145
146 Dialog Semiconductor DA9055 PMIC Support (MFD_DA9055) [N/y/?] (NEW)
147
148 Texas Instruments LP8788 Power Management Unit Driver (MFD_LP8788) [N/y/?] (NEW)
149
150 Maxim Semiconductor MAX8907 PMIC Support (MFD_MAX8907) [N/m/y/?] (NEW) m
151
152 Fairchild FAN53555 Regulator (REGULATOR_FAN53555) [N/m/y/?] (NEW) m
153
154 Maxim 8907 voltage regulator (REGULATOR_MAX8907) [N/m/?] (NEW) m
155
156 TechnoTrend USB IR Receiver (IR_TTUSBIR) [N/m/?] (NEW) m
157
158 Media USB Adapters (MEDIA_USB_SUPPORT) [N/y/?] (NEW) y
159
160 STK1160 USB video capture support (VIDEO_STK1160) [N/m/?] (NEW) m
161
162 STK1160 AC97 codec support (VIDEO_STK1160_AC97) [N/y/?] (NEW) y
163
164 Support for various USB DVB devices v2 (DVB_USB_V2) [N/m/?] (NEW) m
165
166 Enable debug for the B2C2 FlexCop drivers (DVB_B2C2_FLEXCOP_USB_DEBUG) [N/y/?] (NEW)
167
168 Media PCI Adapters (MEDIA_PCI_SUPPORT) [N/y/?] (NEW)
169
170 Media test drivers (V4L_TEST_DRIVERS) [N/y] (NEW)
171
172 ISA and parallel port devices (MEDIA_PARPORT_SUPPORT) [N/y/?] (NEW)
173
174 Autoselect tuners and i2c modules to build (MEDIA_SUBDRV_AUTOSELECT) [Y/n/?] (NEW)
175
176 Maximum debug level (NOUVEAU_DEBUG) [5] (NEW)
177
178 Default debug level (NOUVEAU_DEBUG_DEFAULT) [3] (NEW)
179
180 Backlight Driver for LM3630 (BACKLIGHT_LM3630) [N/m/y/?] (NEW) m
181
182 Backlight Driver for LM3639 (BACKLIGHT_LM3639) [N/m/y/?] (NEW) m
183
184 TPS65217 Backlight (BACKLIGHT_TPS65217) [N/m/?] (NEW) m
185
186 Default time-out for HD-audio power-save mode (SND_HDA_POWER_SAVE_DEFAULT) [0] (NEW)
187
188 CIR via RC class (HID_PICOLCD_CIR) [N/y/?] (NEW)
189
190 Sony PS3 BD Remote Control (HID_PS3REMOTE) [N/m/?] (NEW) m
191
192 HID Sensors framework support (HID_SENSOR_HUB) [N/m/?] (NEW) m
193
194 ZTE USB serial driver (USB_SERIAL_ZTE) [N/m/?] (NEW) m
195
196 OMAP USB2 PHY Driver (OMAP_USB2) [N/m/y/?] (NEW) m
197
198 LED support for LM3642 Chip (LEDS_LM3642) [N/m/y/?] (NEW) m
199
200 LED support for LM355x Chips, LM3554 and LM3556 (LEDS_LM355x) [N/m/y/?] (NEW) m
201
202 LED CPU Trigger (LEDS_TRIGGER_CPU) [N/y/?] (NEW)
203
204 Dynamic compression of swap pages and clean pagecache pages (ZCACHE2) [N/y/?] (NEW)
205
206 Silicom devices (NET_VENDOR_SILICOM) [Y/n/?] (NEW)
207
208 Silicom BypassCTL library support (SBYPASS) [N/m/?] (NEW) m
209
210 Silicom BypassCTL net support (BPCTL) [N/m/?] (NEW) m
211
212 Cambridge Electronic Design 1401 USB support (CED1401) [N/m/?] (NEW) m
213
214 Digi Realport driver (DGRP) [N/m/y/?] (NEW) m
215
216 STE-Modem remoteproc support (STE_MODEM_RPROC) [N/m/y/?] (NEW) m
217
218 SMB2 network file system support (EXPERIMENTAL) (CIFS_SMB2) [N/y/?] (NEW)
219
220 RCU debugging: preemptible RCU race provocation (PROVE_RCU_DELAY) [N/y/?] (NEW)
221
222 Red-Black tree test (RBTREE_TEST) [N/m/?] (NEW) m
223
224 Interval tree test (INTERVAL_TREE_TEST) [N/m/?] (NEW) m
225
226 CAST5 (CAST-128) cipher algorithm (x86_64/AVX) (CRYPTO_CAST5_AVX_X86_64) [N/m/y/?] (NEW) m
227
228 CAST6 (CAST-256) cipher algorithm (x86_64/AVX) (CRYPTO_CAST6_AVX_X86_64) [N/m/y/?] (NEW) m
229
230 Asymmetric (public-key cryptographic) key type (ASYMMETRIC_KEY_TYPE) [N/m/y/?] (NEW) m
231
232 Asymmetric public-key crypto algorithm subtype (ASYMMETRIC_PUBLIC_KEY_SUBTYPE) [N/m/?] (NEW) m
233
234 RSA public-key algorithm (PUBLIC_KEY_ALGO_RSA) [N/m/?] (NEW) m
235
236 X.509 certificate parser (X509_CERTIFICATE_PARSER) [N/m/?] (NEW) m
237
238 --
239 ... _._. ._ ._. . _._. ._. ___ .__ ._. . .__. ._ .. ._.
240 Felix Finch: scarecrow repairman & rocket surgeon / felix@×××××××.com
241 GPG = E987 4493 C860 246C 3B1E 6477 7838 76E9 182E 8151 ITAR license #4933
242 I've found a solution to Fermat's Last Theorem but I see I've run out of room o

Replies

Subject Author
[gentoo-user] Re: 3.7.1 SATA errors Nikos Chantziaras <realnc@×××××.com>
Re: [gentoo-user] 3.7.1 SATA errors Bruce Hill <daddy@×××××××××××××××××××××.com>
[gentoo-user] Re: 3.7.1 SATA errors felix@×××××××.com
Re: [gentoo-user] 3.7.1 SATA errors Florian Philipp <lists@×××××××××××.net>
Re: [gentoo-user] 3.7.1 SATA errors Paul Hartman <paul.hartman+gentoo@×××××.com>
[gentoo-user] Re: 3.7.1 SATA errors -- Bisect done felix@×××××××.com