Gentoo Archives: gentoo-user

From: matthew.garman@×××××.com
To: gentoo-user <gentoo-user@l.g.o>
Subject: [gentoo-user] dying hard drive?
Date: Fri, 13 Jan 2006 21:45:25
Message-Id: 20060113213946.GA27890@sewage.sewage.fake
1 I keep getting hard drive errors in my kernel log/dmesg that have me
2 worried. From /var/log/kernel/current:
3
4 Jan 13 11:42:31 [kernel] hda: dma_intr: status=0x59 { DriveReady SeekComplete DataRequest Error }
5 - Last output repeated 7 times -
6 Jan 13 11:42:39 [kernel] hda: dma_intr: error=0x40 { UncorrectableError }, LBAsect=206696214, high=12, low=5369622, sector=206695927
7 Jan 13 11:42:39 [kernel] ide: failed opcode was: unknown
8 Jan 13 11:42:40 [kernel] hda: dma_intr: status=0x59 { DriveReady SeekComplete DataRequest Error }
9
10
11 The drive is a 160 GB PATA Samsung. It's about two or three years
12 old, running 24x7 (although lightly). The drive has three
13 partitions, all are ext3.
14
15 When I started seeing the above messages, I ran
16
17 fsck.ext3 -f -v -c -c /dev/hda?
18
19 on all three partitions. Note that the "-c" flag includes the bad
20 blocks check.
21
22 I also ran
23
24 smartctl -t long /dev/hda
25
26 On the drive. Apparently, an error was found (details below). I'm
27 not sure if this drive is actually dying, though, as the following
28 article (by the smartmontools author) suggests that one or two
29 errors on a drive is nothing to worry about. Also, the SMART
30 overall-health self-assessment test comes back as PASSED.
31
32 http://www.linuxjournal.com/article/6983
33
34 But the constant kernel messages, along with the error in the "long"
35 SMART test, concern me. At this point, I'm not really sure what my
36 next steps should be, so I'm looking for any suggestions or advice.
37
38 Thanks!
39 Matt
40
41
42
43 # smartctl -a /dev/hda
44
45 smartctl version 5.33 [i686-pc-linux-gnu] Copyright (C) 2002-4 Bruce Allen
46 Home page is http://smartmontools.sourceforge.net/
47
48 === START OF INFORMATION SECTION ===
49 Device Model: SAMSUNG SP1614N
50 Serial Number: 0642J1FW903226
51 Firmware Version: TM100-24
52 User Capacity: 160,041,885,696 bytes
53 Device is: In smartctl database [for details use: -P show]
54 ATA Version is: 7
55 ATA Standard is: ATA/ATAPI-7 T13 1532D revision 0
56 Local Time is: Fri Jan 13 15:24:27 2006 CST
57 SMART support is: Available - device has SMART capability.
58 SMART support is: Enabled
59
60 === START OF READ SMART DATA SECTION ===
61 SMART overall-health self-assessment test result: PASSED
62
63 General SMART Values:
64 Offline data collection status: (0x00) Offline data collection activity
65 was never started.
66 Auto Offline Data Collection: Disabled.
67 Self-test execution status: ( 245) Self-test routine in progress...
68 50% of test remaining.
69 Total time to complete Offline
70 data collection: (5760) seconds.
71 Offline data collection
72 capabilities: (0x1b) SMART execute Offline immediate.
73 Auto Offline data collection on/off support.
74 Suspend Offline collection upon new
75 command.
76 Offline surface scan supported.
77 Self-test supported.
78 No Conveyance Self-test supported.
79 No Selective Self-test supported.
80 SMART capabilities: (0x0003) Saves SMART data before entering
81 power-saving mode.
82 Supports SMART auto save timer.
83 Error logging capability: (0x01) Error logging supported.
84 No General Purpose Logging support.
85 Short self-test routine
86 recommended polling time: ( 1) minutes.
87 Extended self-test routine
88 recommended polling time: ( 96) minutes.
89
90 SMART Attributes Data Structure revision number: 16
91 Vendor Specific SMART Attributes with Thresholds:
92 ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
93 1 Raw_Read_Error_Rate 0x000b 100 100 051 Pre-fail Always - 1
94 3 Spin_Up_Time 0x0007 061 061 000 Pre-fail Always - 6528
95 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 73
96 5 Reallocated_Sector_Ct 0x0033 253 253 010 Pre-fail Always - 0
97 7 Seek_Error_Rate 0x000b 253 253 051 Pre-fail Always - 0
98 8 Seek_Time_Performance 0x0024 253 253 000 Old_age Offline - 0
99 9 Power_On_Half_Minutes 0x0032 098 098 000 Old_age Always - 11505h+32m
100 10 Spin_Retry_Count 0x0013 253 253 049 Pre-fail Always - 0
101 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 50
102 194 Temperature_Celsius 0x0022 163 127 000 Old_age Always - 25
103 195 Hardware_ECC_Recovered 0x000a 100 100 000 Old_age Always - 265460048
104 196 Reallocated_Event_Count 0x0012 100 100 000 Old_age Always - 2
105 197 Current_Pending_Sector 0x0033 253 253 010 Pre-fail Always - 0
106 198 Offline_Uncorrectable 0x0031 100 100 010 Pre-fail Offline - 2
107 199 UDMA_CRC_Error_Count 0x000b 100 100 051 Pre-fail Always - 0
108 200 Multi_Zone_Error_Rate 0x000b 100 100 051 Pre-fail Always - 0
109 201 Soft_Read_Error_Rate 0x000b 100 100 051 Pre-fail Always - 0
110
111 SMART Error Log Version: 1
112 ATA Error Count: 1
113 CR = Command Register [HEX]
114 FR = Features Register [HEX]
115 SC = Sector Count Register [HEX]
116 SN = Sector Number Register [HEX]
117 CL = Cylinder Low Register [HEX]
118 CH = Cylinder High Register [HEX]
119 DH = Device/Head Register [HEX]
120 DC = Device Command Register [HEX]
121 ER = Error register [HEX]
122 ST = Status register [HEX]
123 Powered_Up_Time is measured from power on, and printed as
124 DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
125 SS=sec, and sss=millisec. It "wraps" after 49.710 days.
126
127 Error 1 occurred at disk power-on lifetime: 0 hours (0 days + 0 hours)
128 When the command that caused the error occurred, the device was active or idle.
129
130 After command completion occurred, registers were:
131 ER ST SC SN CL CH DH
132 -- -- -- -- -- -- --
133 04 51 00 01 00 00 a0 Error: ABRT
134
135 Commands leading to the command that caused the error were:
136 CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
137 -- -- -- -- -- -- -- -- ---------------- --------------------
138 b1 c0 00 01 00 00 a0 00 00:00:07.688 DEVICE CONFIGURATION RESTORE
139 ec 00 03 01 00 00 a0 00 00:00:07.688 IDENTIFY DEVICE
140 91 00 3f 01 00 00 af 00 00:00:07.688 INITIALIZE DEVICE PARAMETERS [OBS-6]
141 10 00 00 01 00 00 a0 00 00:00:07.688 RECALIBRATE [OBS-4]
142 ec 00 01 01 00 00 a0 00 00:00:07.688 IDENTIFY DEVICE
143
144 SMART Self-test log structure revision number 1
145 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
146 # 1 Extended offline Completed: read failure 00% 11486 262886799
147 # 2 Short offline Completed without error 00% 11483 -
148
149 Device does not support Selective Self Tests/Logging
150 smartctl version 5.33 [i686-pc-linux-gnu] Copyright (C) 2002-4 Bruce Allen
151 Home page is http://smartmontools.sourceforge.net/
152
153
154
155 # smartctl -l selftest /dev/hda
156
157 === START OF READ SMART DATA SECTION ===
158 SMART Self-test log structure revision number 1
159 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
160 # 1 Extended offline Completed: read failure 00% 11486 262886799
161 # 2 Short offline Completed without error 00% 11483 -
162
163
164 --
165 Matt Garman
166 email at: http://raw-sewage.net/index.php?file=email
167 --
168 gentoo-user@g.o mailing list

Replies

Subject Author
Re: [gentoo-user] dying hard drive? Tim Igoe <tim@×××××××.uk>
Re: [gentoo-user] dying hard drive? Willie Wong <wwong@×××××××××.EDU>
Re: [gentoo-user] dying hard drive? Richard Fish <bigfish@××××××××××.org>
Re: [gentoo-user] dying hard drive? Richard Fish <bigfish@××××××××××.org>