1 |
Howdy, |
2 |
|
3 |
As some know, I recently moved a LOT of data around. Seems to have |
4 |
stressed one of my drives. I got a email from SMART reporting a error. |
5 |
It's info: |
6 |
|
7 |
|
8 |
The following warning/error was logged by the smartd daemon: |
9 |
|
10 |
Device: /dev/sdd [SAT], 1 Currently unreadable (pending) sectors |
11 |
|
12 |
|
13 |
The following warning/error was logged by the smartd daemon: |
14 |
|
15 |
Device: /dev/sdd [SAT], 1 Offline uncorrectable sectors |
16 |
|
17 |
|
18 |
This is from smartctl. |
19 |
|
20 |
|
21 |
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE |
22 |
UPDATED WHEN_FAILED RAW_VALUE |
23 |
1 Raw_Read_Error_Rate 0x000f 083 064 044 Pre-fail |
24 |
Always - 23544426 |
25 |
3 Spin_Up_Time 0x0003 087 086 000 Pre-fail |
26 |
Always - 0 |
27 |
4 Start_Stop_Count 0x0032 100 100 020 Old_age |
28 |
Always - 50 |
29 |
5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail |
30 |
Always - 4 |
31 |
7 Seek_Error_Rate 0x000f 094 060 045 Pre-fail |
32 |
Always - 2694155454 |
33 |
9 Power_On_Hours 0x0032 073 073 000 Old_age |
34 |
Always - 24299 (121 195 0) |
35 |
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail |
36 |
Always - 0 |
37 |
12 Power_Cycle_Count 0x0032 100 100 020 Old_age |
38 |
Always - 35 |
39 |
184 End-to-End_Error 0x0032 100 100 099 Old_age |
40 |
Always - 0 |
41 |
187 Reported_Uncorrect 0x0032 100 100 000 Old_age |
42 |
Always - 0 |
43 |
188 Command_Timeout 0x0032 100 086 000 Old_age |
44 |
Always - 14 14 14 |
45 |
189 High_Fly_Writes 0x003a 100 100 000 Old_age |
46 |
Always - 0 |
47 |
190 Airflow_Temperature_Cel 0x0022 061 059 040 Old_age |
48 |
Always - 39 (Min/Max 30/41) |
49 |
191 G-Sense_Error_Rate 0x0032 092 092 000 Old_age |
50 |
Always - 17952 |
51 |
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age |
52 |
Always - 498 |
53 |
193 Load_Cycle_Count 0x0032 100 100 000 Old_age |
54 |
Always - 1044 |
55 |
194 Temperature_Celsius 0x0022 039 041 000 Old_age |
56 |
Always - 39 (0 18 0 0 0) |
57 |
195 Hardware_ECC_Recovered 0x001a 031 001 000 Old_age |
58 |
Always - 23544426 |
59 |
197 Current_Pending_Sector 0x0012 100 100 000 Old_age |
60 |
Always - 0 |
61 |
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age |
62 |
Offline - 0 |
63 |
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age |
64 |
Always - 0 |
65 |
203 Run_Out_Cancel 0x00b3 100 100 099 Pre-fail |
66 |
Always - 0 |
67 |
240 Head_Flying_Hours 0x0000 100 253 000 Old_age |
68 |
Offline - 24215h+54m+57.249s |
69 |
241 Total_LBAs_Written 0x0000 100 253 000 Old_age |
70 |
Offline - 18070332014 |
71 |
242 Total_LBAs_Read 0x0000 100 253 000 Old_age |
72 |
Offline - 18343277504 |
73 |
|
74 |
|
75 |
|
76 |
The nutshell is #5 up there. #198 was a issue until I ran the long |
77 |
selftest. It moved to #5 plus added 3 or 4 it seems. According to |
78 |
google results, it should be fine for now. Still, a replacement drive |
79 |
is on the way and I've unmount the drives for that LVM. They still |
80 |
spinning and running a selftest but nothing else should be accessing |
81 |
them. This is also from the selftest. |
82 |
|
83 |
|
84 |
SMART Self-test log structure revision number 1 |
85 |
Num Test_Description Status Remaining |
86 |
LifeTime(hours) LBA_of_first_error |
87 |
# 1 Extended offline Self-test routine in progress 90% |
88 |
24299 - |
89 |
# 2 Short offline Completed without error 00% |
90 |
24298 - |
91 |
# 3 Extended offline Completed without error 00% |
92 |
24291 - |
93 |
# 4 Extended offline Aborted by host 10% |
94 |
24266 - |
95 |
# 5 Short offline Completed without error 00% |
96 |
24218 - |
97 |
# 6 Short offline Completed without error 00% |
98 |
24194 - |
99 |
# 7 Short offline Completed without error 00% |
100 |
24171 - |
101 |
# 8 Short offline Completed without error 00% |
102 |
24146 - |
103 |
|
104 |
The one I aborted was because it was stuck on 10% for well over a day. |
105 |
The whole test doesn't take that long, or shouldn't anyway. I restarted |
106 |
it shortly after that. I might add, the test did take many hours longer |
107 |
than it estimated which from my past experience is quite odd. It's |
108 |
usually pretty accurate. Still, it completed and shows it passed, just |
109 |
has a boo boo on it. I also did a file system check it fixed a couple |
110 |
problems and a bunch of little things I see corrected often on bootup. |
111 |
Something about length of something. Seems trivial. |
112 |
|
113 |
Given the low number and it showing it corrected that error, and then |
114 |
passed a short and long test, is this drive "safe enough" to keep in |
115 |
service? I have backups just in case but just curious what others know |
116 |
from experience. At least this isn't one of those nasty messages that |
117 |
the drive will die within 24 hours. I got one of those ages ago and it |
118 |
didn't miss it by much. A little over 30 hours or so later, it was a |
119 |
door stop. It would spin but it couldn't even be seen by the BIOS. |
120 |
Maybe drives are getting better and SMART is getting better as well. |
121 |
|
122 |
Thoughts. Replace as soon as drive arrives or wait and see? |
123 |
|
124 |
Dale |
125 |
|
126 |
:-) :-) |