1 |
I keep getting hard drive errors in my kernel log/dmesg that have me |
2 |
worried. From /var/log/kernel/current: |
3 |
|
4 |
Jan 13 11:42:31 [kernel] hda: dma_intr: status=0x59 { DriveReady SeekComplete DataRequest Error } |
5 |
- Last output repeated 7 times - |
6 |
Jan 13 11:42:39 [kernel] hda: dma_intr: error=0x40 { UncorrectableError }, LBAsect=206696214, high=12, low=5369622, sector=206695927 |
7 |
Jan 13 11:42:39 [kernel] ide: failed opcode was: unknown |
8 |
Jan 13 11:42:40 [kernel] hda: dma_intr: status=0x59 { DriveReady SeekComplete DataRequest Error } |
9 |
|
10 |
|
11 |
The drive is a 160 GB PATA Samsung. It's about two or three years |
12 |
old, running 24x7 (although lightly). The drive has three |
13 |
partitions, all are ext3. |
14 |
|
15 |
When I started seeing the above messages, I ran |
16 |
|
17 |
fsck.ext3 -f -v -c -c /dev/hda? |
18 |
|
19 |
on all three partitions. Note that the "-c" flag includes the bad |
20 |
blocks check. |
21 |
|
22 |
I also ran |
23 |
|
24 |
smartctl -t long /dev/hda |
25 |
|
26 |
On the drive. Apparently, an error was found (details below). I'm |
27 |
not sure if this drive is actually dying, though, as the following |
28 |
article (by the smartmontools author) suggests that one or two |
29 |
errors on a drive is nothing to worry about. Also, the SMART |
30 |
overall-health self-assessment test comes back as PASSED. |
31 |
|
32 |
http://www.linuxjournal.com/article/6983 |
33 |
|
34 |
But the constant kernel messages, along with the error in the "long" |
35 |
SMART test, concern me. At this point, I'm not really sure what my |
36 |
next steps should be, so I'm looking for any suggestions or advice. |
37 |
|
38 |
Thanks! |
39 |
Matt |
40 |
|
41 |
|
42 |
|
43 |
# smartctl -a /dev/hda |
44 |
|
45 |
smartctl version 5.33 [i686-pc-linux-gnu] Copyright (C) 2002-4 Bruce Allen |
46 |
Home page is http://smartmontools.sourceforge.net/ |
47 |
|
48 |
=== START OF INFORMATION SECTION === |
49 |
Device Model: SAMSUNG SP1614N |
50 |
Serial Number: 0642J1FW903226 |
51 |
Firmware Version: TM100-24 |
52 |
User Capacity: 160,041,885,696 bytes |
53 |
Device is: In smartctl database [for details use: -P show] |
54 |
ATA Version is: 7 |
55 |
ATA Standard is: ATA/ATAPI-7 T13 1532D revision 0 |
56 |
Local Time is: Fri Jan 13 15:24:27 2006 CST |
57 |
SMART support is: Available - device has SMART capability. |
58 |
SMART support is: Enabled |
59 |
|
60 |
=== START OF READ SMART DATA SECTION === |
61 |
SMART overall-health self-assessment test result: PASSED |
62 |
|
63 |
General SMART Values: |
64 |
Offline data collection status: (0x00) Offline data collection activity |
65 |
was never started. |
66 |
Auto Offline Data Collection: Disabled. |
67 |
Self-test execution status: ( 245) Self-test routine in progress... |
68 |
50% of test remaining. |
69 |
Total time to complete Offline |
70 |
data collection: (5760) seconds. |
71 |
Offline data collection |
72 |
capabilities: (0x1b) SMART execute Offline immediate. |
73 |
Auto Offline data collection on/off support. |
74 |
Suspend Offline collection upon new |
75 |
command. |
76 |
Offline surface scan supported. |
77 |
Self-test supported. |
78 |
No Conveyance Self-test supported. |
79 |
No Selective Self-test supported. |
80 |
SMART capabilities: (0x0003) Saves SMART data before entering |
81 |
power-saving mode. |
82 |
Supports SMART auto save timer. |
83 |
Error logging capability: (0x01) Error logging supported. |
84 |
No General Purpose Logging support. |
85 |
Short self-test routine |
86 |
recommended polling time: ( 1) minutes. |
87 |
Extended self-test routine |
88 |
recommended polling time: ( 96) minutes. |
89 |
|
90 |
SMART Attributes Data Structure revision number: 16 |
91 |
Vendor Specific SMART Attributes with Thresholds: |
92 |
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE |
93 |
1 Raw_Read_Error_Rate 0x000b 100 100 051 Pre-fail Always - 1 |
94 |
3 Spin_Up_Time 0x0007 061 061 000 Pre-fail Always - 6528 |
95 |
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 73 |
96 |
5 Reallocated_Sector_Ct 0x0033 253 253 010 Pre-fail Always - 0 |
97 |
7 Seek_Error_Rate 0x000b 253 253 051 Pre-fail Always - 0 |
98 |
8 Seek_Time_Performance 0x0024 253 253 000 Old_age Offline - 0 |
99 |
9 Power_On_Half_Minutes 0x0032 098 098 000 Old_age Always - 11505h+32m |
100 |
10 Spin_Retry_Count 0x0013 253 253 049 Pre-fail Always - 0 |
101 |
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 50 |
102 |
194 Temperature_Celsius 0x0022 163 127 000 Old_age Always - 25 |
103 |
195 Hardware_ECC_Recovered 0x000a 100 100 000 Old_age Always - 265460048 |
104 |
196 Reallocated_Event_Count 0x0012 100 100 000 Old_age Always - 2 |
105 |
197 Current_Pending_Sector 0x0033 253 253 010 Pre-fail Always - 0 |
106 |
198 Offline_Uncorrectable 0x0031 100 100 010 Pre-fail Offline - 2 |
107 |
199 UDMA_CRC_Error_Count 0x000b 100 100 051 Pre-fail Always - 0 |
108 |
200 Multi_Zone_Error_Rate 0x000b 100 100 051 Pre-fail Always - 0 |
109 |
201 Soft_Read_Error_Rate 0x000b 100 100 051 Pre-fail Always - 0 |
110 |
|
111 |
SMART Error Log Version: 1 |
112 |
ATA Error Count: 1 |
113 |
CR = Command Register [HEX] |
114 |
FR = Features Register [HEX] |
115 |
SC = Sector Count Register [HEX] |
116 |
SN = Sector Number Register [HEX] |
117 |
CL = Cylinder Low Register [HEX] |
118 |
CH = Cylinder High Register [HEX] |
119 |
DH = Device/Head Register [HEX] |
120 |
DC = Device Command Register [HEX] |
121 |
ER = Error register [HEX] |
122 |
ST = Status register [HEX] |
123 |
Powered_Up_Time is measured from power on, and printed as |
124 |
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes, |
125 |
SS=sec, and sss=millisec. It "wraps" after 49.710 days. |
126 |
|
127 |
Error 1 occurred at disk power-on lifetime: 0 hours (0 days + 0 hours) |
128 |
When the command that caused the error occurred, the device was active or idle. |
129 |
|
130 |
After command completion occurred, registers were: |
131 |
ER ST SC SN CL CH DH |
132 |
-- -- -- -- -- -- -- |
133 |
04 51 00 01 00 00 a0 Error: ABRT |
134 |
|
135 |
Commands leading to the command that caused the error were: |
136 |
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name |
137 |
-- -- -- -- -- -- -- -- ---------------- -------------------- |
138 |
b1 c0 00 01 00 00 a0 00 00:00:07.688 DEVICE CONFIGURATION RESTORE |
139 |
ec 00 03 01 00 00 a0 00 00:00:07.688 IDENTIFY DEVICE |
140 |
91 00 3f 01 00 00 af 00 00:00:07.688 INITIALIZE DEVICE PARAMETERS [OBS-6] |
141 |
10 00 00 01 00 00 a0 00 00:00:07.688 RECALIBRATE [OBS-4] |
142 |
ec 00 01 01 00 00 a0 00 00:00:07.688 IDENTIFY DEVICE |
143 |
|
144 |
SMART Self-test log structure revision number 1 |
145 |
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error |
146 |
# 1 Extended offline Completed: read failure 00% 11486 262886799 |
147 |
# 2 Short offline Completed without error 00% 11483 - |
148 |
|
149 |
Device does not support Selective Self Tests/Logging |
150 |
smartctl version 5.33 [i686-pc-linux-gnu] Copyright (C) 2002-4 Bruce Allen |
151 |
Home page is http://smartmontools.sourceforge.net/ |
152 |
|
153 |
|
154 |
|
155 |
# smartctl -l selftest /dev/hda |
156 |
|
157 |
=== START OF READ SMART DATA SECTION === |
158 |
SMART Self-test log structure revision number 1 |
159 |
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error |
160 |
# 1 Extended offline Completed: read failure 00% 11486 262886799 |
161 |
# 2 Short offline Completed without error 00% 11483 - |
162 |
|
163 |
|
164 |
-- |
165 |
Matt Garman |
166 |
email at: http://raw-sewage.net/index.php?file=email |
167 |
-- |
168 |
gentoo-user@g.o mailing list |