1 |
matthew.garman@×××××.com wrote: |
2 |
|
3 |
>I keep getting hard drive errors in my kernel log/dmesg that have me |
4 |
>worried. From /var/log/kernel/current: |
5 |
> |
6 |
>Jan 13 11:42:31 [kernel] hda: dma_intr: status=0x59 { DriveReady SeekComplete DataRequest Error } |
7 |
> - Last output repeated 7 times - |
8 |
>Jan 13 11:42:39 [kernel] hda: dma_intr: error=0x40 { UncorrectableError }, LBAsect=206696214, high=12, low=5369622, sector=206695927 |
9 |
>Jan 13 11:42:39 [kernel] ide: failed opcode was: unknown |
10 |
>Jan 13 11:42:40 [kernel] hda: dma_intr: status=0x59 { DriveReady SeekComplete DataRequest Error } |
11 |
> |
12 |
> |
13 |
> |
14 |
> |
15 |
Exactly the same message I noticed less than 1hr before my Maxtor |
16 |
DiamondMax 9 packed in just before xmas. Annoyingly my drive wouldn't |
17 |
mount the main data partition but everything else seemed in tact. I |
18 |
managed to recover all my data from the drive using dd once i had a new |
19 |
drive. |
20 |
|
21 |
I'd recommend backing up anything thats essencial on the drive and |
22 |
preparing for it to give up the ghost. |
23 |
|
24 |
>The drive is a 160 GB PATA Samsung. It's about two or three years |
25 |
>old, running 24x7 (although lightly). The drive has three |
26 |
>partitions, all are ext3. |
27 |
> |
28 |
>When I started seeing the above messages, I ran |
29 |
> |
30 |
> fsck.ext3 -f -v -c -c /dev/hda? |
31 |
> |
32 |
>on all three partitions. Note that the "-c" flag includes the bad |
33 |
>blocks check. |
34 |
> |
35 |
>I also ran |
36 |
> |
37 |
> smartctl -t long /dev/hda |
38 |
> |
39 |
>On the drive. Apparently, an error was found (details below). I'm |
40 |
>not sure if this drive is actually dying, though, as the following |
41 |
>article (by the smartmontools author) suggests that one or two |
42 |
>errors on a drive is nothing to worry about. Also, the SMART |
43 |
>overall-health self-assessment test comes back as PASSED. |
44 |
> |
45 |
> http://www.linuxjournal.com/article/6983 |
46 |
> |
47 |
>But the constant kernel messages, along with the error in the "long" |
48 |
>SMART test, concern me. At this point, I'm not really sure what my |
49 |
>next steps should be, so I'm looking for any suggestions or advice. |
50 |
> |
51 |
>Thanks! |
52 |
>Matt |
53 |
> |
54 |
> |
55 |
> |
56 |
># smartctl -a /dev/hda |
57 |
> |
58 |
>smartctl version 5.33 [i686-pc-linux-gnu] Copyright (C) 2002-4 Bruce Allen |
59 |
>Home page is http://smartmontools.sourceforge.net/ |
60 |
> |
61 |
>=== START OF INFORMATION SECTION === |
62 |
>Device Model: SAMSUNG SP1614N |
63 |
>Serial Number: 0642J1FW903226 |
64 |
>Firmware Version: TM100-24 |
65 |
>User Capacity: 160,041,885,696 bytes |
66 |
>Device is: In smartctl database [for details use: -P show] |
67 |
>ATA Version is: 7 |
68 |
>ATA Standard is: ATA/ATAPI-7 T13 1532D revision 0 |
69 |
>Local Time is: Fri Jan 13 15:24:27 2006 CST |
70 |
>SMART support is: Available - device has SMART capability. |
71 |
>SMART support is: Enabled |
72 |
> |
73 |
>=== START OF READ SMART DATA SECTION === |
74 |
>SMART overall-health self-assessment test result: PASSED |
75 |
> |
76 |
>General SMART Values: |
77 |
>Offline data collection status: (0x00) Offline data collection activity |
78 |
> was never started. |
79 |
> Auto Offline Data Collection: Disabled. |
80 |
>Self-test execution status: ( 245) Self-test routine in progress... |
81 |
> 50% of test remaining. |
82 |
>Total time to complete Offline |
83 |
>data collection: (5760) seconds. |
84 |
>Offline data collection |
85 |
>capabilities: (0x1b) SMART execute Offline immediate. |
86 |
> Auto Offline data collection on/off support. |
87 |
> Suspend Offline collection upon new |
88 |
> command. |
89 |
> Offline surface scan supported. |
90 |
> Self-test supported. |
91 |
> No Conveyance Self-test supported. |
92 |
> No Selective Self-test supported. |
93 |
>SMART capabilities: (0x0003) Saves SMART data before entering |
94 |
> power-saving mode. |
95 |
> Supports SMART auto save timer. |
96 |
>Error logging capability: (0x01) Error logging supported. |
97 |
> No General Purpose Logging support. |
98 |
>Short self-test routine |
99 |
>recommended polling time: ( 1) minutes. |
100 |
>Extended self-test routine |
101 |
>recommended polling time: ( 96) minutes. |
102 |
> |
103 |
>SMART Attributes Data Structure revision number: 16 |
104 |
>Vendor Specific SMART Attributes with Thresholds: |
105 |
>ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE |
106 |
> 1 Raw_Read_Error_Rate 0x000b 100 100 051 Pre-fail Always - 1 |
107 |
> 3 Spin_Up_Time 0x0007 061 061 000 Pre-fail Always - 6528 |
108 |
> 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 73 |
109 |
> 5 Reallocated_Sector_Ct 0x0033 253 253 010 Pre-fail Always - 0 |
110 |
> 7 Seek_Error_Rate 0x000b 253 253 051 Pre-fail Always - 0 |
111 |
> 8 Seek_Time_Performance 0x0024 253 253 000 Old_age Offline - 0 |
112 |
> 9 Power_On_Half_Minutes 0x0032 098 098 000 Old_age Always - 11505h+32m |
113 |
> 10 Spin_Retry_Count 0x0013 253 253 049 Pre-fail Always - 0 |
114 |
> 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 50 |
115 |
>194 Temperature_Celsius 0x0022 163 127 000 Old_age Always - 25 |
116 |
>195 Hardware_ECC_Recovered 0x000a 100 100 000 Old_age Always - 265460048 |
117 |
>196 Reallocated_Event_Count 0x0012 100 100 000 Old_age Always - 2 |
118 |
>197 Current_Pending_Sector 0x0033 253 253 010 Pre-fail Always - 0 |
119 |
>198 Offline_Uncorrectable 0x0031 100 100 010 Pre-fail Offline - 2 |
120 |
>199 UDMA_CRC_Error_Count 0x000b 100 100 051 Pre-fail Always - 0 |
121 |
>200 Multi_Zone_Error_Rate 0x000b 100 100 051 Pre-fail Always - 0 |
122 |
>201 Soft_Read_Error_Rate 0x000b 100 100 051 Pre-fail Always - 0 |
123 |
> |
124 |
>SMART Error Log Version: 1 |
125 |
>ATA Error Count: 1 |
126 |
> CR = Command Register [HEX] |
127 |
> FR = Features Register [HEX] |
128 |
> SC = Sector Count Register [HEX] |
129 |
> SN = Sector Number Register [HEX] |
130 |
> CL = Cylinder Low Register [HEX] |
131 |
> CH = Cylinder High Register [HEX] |
132 |
> DH = Device/Head Register [HEX] |
133 |
> DC = Device Command Register [HEX] |
134 |
> ER = Error register [HEX] |
135 |
> ST = Status register [HEX] |
136 |
>Powered_Up_Time is measured from power on, and printed as |
137 |
>DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes, |
138 |
>SS=sec, and sss=millisec. It "wraps" after 49.710 days. |
139 |
> |
140 |
>Error 1 occurred at disk power-on lifetime: 0 hours (0 days + 0 hours) |
141 |
> When the command that caused the error occurred, the device was active or idle. |
142 |
> |
143 |
> After command completion occurred, registers were: |
144 |
> ER ST SC SN CL CH DH |
145 |
> -- -- -- -- -- -- -- |
146 |
> 04 51 00 01 00 00 a0 Error: ABRT |
147 |
> |
148 |
> Commands leading to the command that caused the error were: |
149 |
> CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name |
150 |
> -- -- -- -- -- -- -- -- ---------------- -------------------- |
151 |
> b1 c0 00 01 00 00 a0 00 00:00:07.688 DEVICE CONFIGURATION RESTORE |
152 |
> ec 00 03 01 00 00 a0 00 00:00:07.688 IDENTIFY DEVICE |
153 |
> 91 00 3f 01 00 00 af 00 00:00:07.688 INITIALIZE DEVICE PARAMETERS [OBS-6] |
154 |
> 10 00 00 01 00 00 a0 00 00:00:07.688 RECALIBRATE [OBS-4] |
155 |
> ec 00 01 01 00 00 a0 00 00:00:07.688 IDENTIFY DEVICE |
156 |
> |
157 |
>SMART Self-test log structure revision number 1 |
158 |
>Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error |
159 |
># 1 Extended offline Completed: read failure 00% 11486 262886799 |
160 |
># 2 Short offline Completed without error 00% 11483 - |
161 |
> |
162 |
>Device does not support Selective Self Tests/Logging |
163 |
>smartctl version 5.33 [i686-pc-linux-gnu] Copyright (C) 2002-4 Bruce Allen |
164 |
>Home page is http://smartmontools.sourceforge.net/ |
165 |
> |
166 |
> |
167 |
> |
168 |
># smartctl -l selftest /dev/hda |
169 |
> |
170 |
>=== START OF READ SMART DATA SECTION === |
171 |
>SMART Self-test log structure revision number 1 |
172 |
>Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error |
173 |
># 1 Extended offline Completed: read failure 00% 11486 262886799 |
174 |
># 2 Short offline Completed without error 00% 11483 - |
175 |
> |
176 |
> |
177 |
> |
178 |
> |
179 |
Tim |
180 |
|
181 |
-- |
182 |
gentoo-user@g.o mailing list |