Gentoo Archives: gentoo-user

From: Tim Igoe <tim@×××××××.uk>
To: gentoo-user@l.g.o
Subject: Re: [gentoo-user] dying hard drive?
Date: Fri, 13 Jan 2006 22:14:03
Message-Id: 43C824E2.3060003@igoe.me.uk
In Reply to: [gentoo-user] dying hard drive? by matthew.garman@gmail.com
1 matthew.garman@×××××.com wrote:
2
3 >I keep getting hard drive errors in my kernel log/dmesg that have me
4 >worried. From /var/log/kernel/current:
5 >
6 >Jan 13 11:42:31 [kernel] hda: dma_intr: status=0x59 { DriveReady SeekComplete DataRequest Error }
7 > - Last output repeated 7 times -
8 >Jan 13 11:42:39 [kernel] hda: dma_intr: error=0x40 { UncorrectableError }, LBAsect=206696214, high=12, low=5369622, sector=206695927
9 >Jan 13 11:42:39 [kernel] ide: failed opcode was: unknown
10 >Jan 13 11:42:40 [kernel] hda: dma_intr: status=0x59 { DriveReady SeekComplete DataRequest Error }
11 >
12 >
13 >
14 >
15 Exactly the same message I noticed less than 1hr before my Maxtor
16 DiamondMax 9 packed in just before xmas. Annoyingly my drive wouldn't
17 mount the main data partition but everything else seemed in tact. I
18 managed to recover all my data from the drive using dd once i had a new
19 drive.
20
21 I'd recommend backing up anything thats essencial on the drive and
22 preparing for it to give up the ghost.
23
24 >The drive is a 160 GB PATA Samsung. It's about two or three years
25 >old, running 24x7 (although lightly). The drive has three
26 >partitions, all are ext3.
27 >
28 >When I started seeing the above messages, I ran
29 >
30 > fsck.ext3 -f -v -c -c /dev/hda?
31 >
32 >on all three partitions. Note that the "-c" flag includes the bad
33 >blocks check.
34 >
35 >I also ran
36 >
37 > smartctl -t long /dev/hda
38 >
39 >On the drive. Apparently, an error was found (details below). I'm
40 >not sure if this drive is actually dying, though, as the following
41 >article (by the smartmontools author) suggests that one or two
42 >errors on a drive is nothing to worry about. Also, the SMART
43 >overall-health self-assessment test comes back as PASSED.
44 >
45 > http://www.linuxjournal.com/article/6983
46 >
47 >But the constant kernel messages, along with the error in the "long"
48 >SMART test, concern me. At this point, I'm not really sure what my
49 >next steps should be, so I'm looking for any suggestions or advice.
50 >
51 >Thanks!
52 >Matt
53 >
54 >
55 >
56 ># smartctl -a /dev/hda
57 >
58 >smartctl version 5.33 [i686-pc-linux-gnu] Copyright (C) 2002-4 Bruce Allen
59 >Home page is http://smartmontools.sourceforge.net/
60 >
61 >=== START OF INFORMATION SECTION ===
62 >Device Model: SAMSUNG SP1614N
63 >Serial Number: 0642J1FW903226
64 >Firmware Version: TM100-24
65 >User Capacity: 160,041,885,696 bytes
66 >Device is: In smartctl database [for details use: -P show]
67 >ATA Version is: 7
68 >ATA Standard is: ATA/ATAPI-7 T13 1532D revision 0
69 >Local Time is: Fri Jan 13 15:24:27 2006 CST
70 >SMART support is: Available - device has SMART capability.
71 >SMART support is: Enabled
72 >
73 >=== START OF READ SMART DATA SECTION ===
74 >SMART overall-health self-assessment test result: PASSED
75 >
76 >General SMART Values:
77 >Offline data collection status: (0x00) Offline data collection activity
78 > was never started.
79 > Auto Offline Data Collection: Disabled.
80 >Self-test execution status: ( 245) Self-test routine in progress...
81 > 50% of test remaining.
82 >Total time to complete Offline
83 >data collection: (5760) seconds.
84 >Offline data collection
85 >capabilities: (0x1b) SMART execute Offline immediate.
86 > Auto Offline data collection on/off support.
87 > Suspend Offline collection upon new
88 > command.
89 > Offline surface scan supported.
90 > Self-test supported.
91 > No Conveyance Self-test supported.
92 > No Selective Self-test supported.
93 >SMART capabilities: (0x0003) Saves SMART data before entering
94 > power-saving mode.
95 > Supports SMART auto save timer.
96 >Error logging capability: (0x01) Error logging supported.
97 > No General Purpose Logging support.
98 >Short self-test routine
99 >recommended polling time: ( 1) minutes.
100 >Extended self-test routine
101 >recommended polling time: ( 96) minutes.
102 >
103 >SMART Attributes Data Structure revision number: 16
104 >Vendor Specific SMART Attributes with Thresholds:
105 >ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
106 > 1 Raw_Read_Error_Rate 0x000b 100 100 051 Pre-fail Always - 1
107 > 3 Spin_Up_Time 0x0007 061 061 000 Pre-fail Always - 6528
108 > 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 73
109 > 5 Reallocated_Sector_Ct 0x0033 253 253 010 Pre-fail Always - 0
110 > 7 Seek_Error_Rate 0x000b 253 253 051 Pre-fail Always - 0
111 > 8 Seek_Time_Performance 0x0024 253 253 000 Old_age Offline - 0
112 > 9 Power_On_Half_Minutes 0x0032 098 098 000 Old_age Always - 11505h+32m
113 > 10 Spin_Retry_Count 0x0013 253 253 049 Pre-fail Always - 0
114 > 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 50
115 >194 Temperature_Celsius 0x0022 163 127 000 Old_age Always - 25
116 >195 Hardware_ECC_Recovered 0x000a 100 100 000 Old_age Always - 265460048
117 >196 Reallocated_Event_Count 0x0012 100 100 000 Old_age Always - 2
118 >197 Current_Pending_Sector 0x0033 253 253 010 Pre-fail Always - 0
119 >198 Offline_Uncorrectable 0x0031 100 100 010 Pre-fail Offline - 2
120 >199 UDMA_CRC_Error_Count 0x000b 100 100 051 Pre-fail Always - 0
121 >200 Multi_Zone_Error_Rate 0x000b 100 100 051 Pre-fail Always - 0
122 >201 Soft_Read_Error_Rate 0x000b 100 100 051 Pre-fail Always - 0
123 >
124 >SMART Error Log Version: 1
125 >ATA Error Count: 1
126 > CR = Command Register [HEX]
127 > FR = Features Register [HEX]
128 > SC = Sector Count Register [HEX]
129 > SN = Sector Number Register [HEX]
130 > CL = Cylinder Low Register [HEX]
131 > CH = Cylinder High Register [HEX]
132 > DH = Device/Head Register [HEX]
133 > DC = Device Command Register [HEX]
134 > ER = Error register [HEX]
135 > ST = Status register [HEX]
136 >Powered_Up_Time is measured from power on, and printed as
137 >DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
138 >SS=sec, and sss=millisec. It "wraps" after 49.710 days.
139 >
140 >Error 1 occurred at disk power-on lifetime: 0 hours (0 days + 0 hours)
141 > When the command that caused the error occurred, the device was active or idle.
142 >
143 > After command completion occurred, registers were:
144 > ER ST SC SN CL CH DH
145 > -- -- -- -- -- -- --
146 > 04 51 00 01 00 00 a0 Error: ABRT
147 >
148 > Commands leading to the command that caused the error were:
149 > CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
150 > -- -- -- -- -- -- -- -- ---------------- --------------------
151 > b1 c0 00 01 00 00 a0 00 00:00:07.688 DEVICE CONFIGURATION RESTORE
152 > ec 00 03 01 00 00 a0 00 00:00:07.688 IDENTIFY DEVICE
153 > 91 00 3f 01 00 00 af 00 00:00:07.688 INITIALIZE DEVICE PARAMETERS [OBS-6]
154 > 10 00 00 01 00 00 a0 00 00:00:07.688 RECALIBRATE [OBS-4]
155 > ec 00 01 01 00 00 a0 00 00:00:07.688 IDENTIFY DEVICE
156 >
157 >SMART Self-test log structure revision number 1
158 >Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
159 ># 1 Extended offline Completed: read failure 00% 11486 262886799
160 ># 2 Short offline Completed without error 00% 11483 -
161 >
162 >Device does not support Selective Self Tests/Logging
163 >smartctl version 5.33 [i686-pc-linux-gnu] Copyright (C) 2002-4 Bruce Allen
164 >Home page is http://smartmontools.sourceforge.net/
165 >
166 >
167 >
168 ># smartctl -l selftest /dev/hda
169 >
170 >=== START OF READ SMART DATA SECTION ===
171 >SMART Self-test log structure revision number 1
172 >Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
173 ># 1 Extended offline Completed: read failure 00% 11486 262886799
174 ># 2 Short offline Completed without error 00% 11483 -
175 >
176 >
177 >
178 >
179 Tim
180
181 --
182 gentoo-user@g.o mailing list