Gentoo Archives: gentoo-user

From: Paul Hartman <paul.hartman+gentoo@×××××.com>
To: gentoo-user@l.g.o
Subject: [gentoo-user] Kernel2.6.33: ATA failed command: READ FPDMA QUEUED, hard resetting link
Date: Fri, 26 Mar 2010 20:17:55
Message-Id: 58965d8a1003261317j24856b5cied7c5bf4b83ebf50@mail.gmail.com
1 Hi,
2
3 Setting up and testing my new system (after wasting nearly 1 month
4 with bad RAM modules), I got this error today:
5
6 [48055.741389] ata3.00: exception Emask 0x0 SAct 0x2 SErr 0x0 action 0x6 frozen
7 [48055.741393] ata3.00: failed command: READ FPDMA QUEUED
8 [48055.741398] ata3.00: cmd 60/20:08:38:15:03/01:00:18:00:00/40 tag 1
9 ncq 147456 in
10 [48055.741400] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask
11 0x4 (timeout)
12 [48055.741402] ata3.00: status: { DRDY }
13 [48055.741405] ata3: hard resetting link
14 [48056.198746] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
15 [48056.210514] ata3.00: configured for UDMA/133
16 [48056.210518] ata3.00: device reported invalid CHS sector 0
17 [48056.210523] ata3: EH complete
18
19 I really don't understand what it means, but the "timeout", "hard
20 resetting link" and "invalid CHS sector 0" look scary to me...
21
22 Initial bootup messages for this device were:
23 Mar 25 22:02:32 [kernel] [ 4.496102] ata3: SATA max UDMA/133 abar
24 m2048@0xfbffc000 port 0xfbffc200 irq 34
25 Mar 25 22:02:32 [kernel] [ 8.519169] ata3: SATA link up 3.0 Gbps
26 (SStatus 123 SControl 300)
27 Mar 25 22:02:32 [kernel] [ 8.536681] ata3.00: ATA-8: SAMSUNG
28 HD203WI, 1AN10002, max UDMA/133
29 Mar 25 22:02:32 [kernel] [ 8.548388] ata3.00: 3907029168 sectors,
30 multi 0: LBA48 NCQ (depth 31/32), AA
31 Mar 25 22:02:32 [kernel] [ 8.566100] ata3.00: configured for UDMA/133
32
33 That disk is part of a md RAID5, but I was at work when this error
34 happened so I didn't notice if the RAID repaired itself or whatever
35 would happen in this case (I don't have mdadm monitoring configured
36 yet). Right now all RAID disks are all up and healthy.
37
38 I googled it but most of the results are pastebin snippets. I'm using
39 kernel 2.6.33 and ahci driver for the SATA controllers.
40
41 From libata documentation in the section about timeout errors it says:
42 "Most often this is due to an unrelated interrupt subsystem bug (try
43 booting with 'pci=nomsi' or 'acpi=off' or 'noapic'), which failed to
44 deliver an interrupt when we were expecting one from the hardware."
45
46 I really don't know the potential implications of disabling MSI or
47 APIC, but in /proc/interrupts I do see AHCI related to both MSI and
48 APIC rows. So at least I know they are active right now.
49
50 Temperatures in my system are good, hddtemp says the drive in question
51 is 21C degrees right now.
52
53 Another possibility is that I need to increase voltage on the
54 motherboard, since it is running 6 hdd's and 1 DVD-ROM. I'll have to
55 research to see which voltage is related to this. (X58 motherboard)
56
57 Thanks in advance if anyone has any knowledge about this, otherwise I
58 go to trial-and-hopefully-no-error mode. :)
59
60 Paul

Replies