Gentoo Archives: gentoo-user

From: thegeezer <thegeezer@×××××××××.net>
To: gentoo-user@l.g.o
Subject: Re: [gentoo-user] smartctrl drive error @60%
Date: Wed, 25 Jun 2014 10:42:38
Message-Id: 53AAA791.4050506@thegeezer.net
In Reply to: Re: [gentoo-user] smartctrl drive error @60% by Dale
1 On 06/25/2014 08:49 AM, Dale wrote:
2 > thegeezer wrote:
3 >> this is pretty bad.
4 > Here is the output:
5 >
6 > root@fireball / # smartctl -a /dev/sdc
7 > smartctl 6.1 2013-03-16 r3800 [x86_64-linux-3.14.0-gentoo] (local build)
8 > Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org
9 >
10 > === START OF INFORMATION SECTION ===
11 > Model Family: Seagate Barracuda 7200.14 (AF)
12 > Device Model: ST3000DM001-9YN166
13 > Serial Number: Z1F0PKT5
14 > LU WWN Device Id: 5 000c50 04d79e15c
15 > Firmware Version: CC4C
16 > User Capacity: 3,000,592,982,016 bytes [3.00 TB]
17 > Sector Sizes: 512 bytes logical, 4096 bytes physical
18 > Rotation Rate: 7200 rpm
19 > Device is: In smartctl database [for details use: -P show]
20 > ATA Version is: ATA8-ACS T13/1699-D revision 4
21 > SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
22 > Local Time is: Wed Jun 25 02:46:39 2014 CDT
23 >
24 > ==> WARNING: A firmware update for this drive is available,
25 > see the following Seagate web pages:
26 > http://knowledge.seagate.com/articles/en_US/FAQ/207931en
27 > http://knowledge.seagate.com/articles/en_US/FAQ/223651en
28
29 interesting - not seen that before might be worth a nose
30
31 > SMART support is: Available - device has SMART capability.
32 > SMART support is: Enabled
33 >
34 > === START OF READ SMART DATA SECTION ===
35 > SMART overall-health self-assessment test result: PASSED
36 >
37 > General SMART Values:
38 > Offline data collection status: (0x00) Offline data collection activity
39 > was never started.
40 > Auto Offline Data Collection:
41 > Disabled.
42 > Self-test execution status: ( 118) The previous self-test completed
43 > having
44 > the read element of the test failed.
45 > Total time to complete Offline
46 > data collection: ( 584) seconds.
47 > Offline data collection
48 > capabilities: (0x73) SMART execute Offline immediate.
49 > Auto Offline data collection
50 > on/off support.
51 > Suspend Offline collection upon new
52 > command.
53 > No Offline surface scan supported.
54 > Self-test supported.
55 > Conveyance Self-test supported.
56 > Selective Self-test supported.
57 > SMART capabilities: (0x0003) Saves SMART data before entering
58 > power-saving mode.
59 > Supports SMART auto save timer.
60 > Error logging capability: (0x01) Error logging supported.
61 > General Purpose Logging supported.
62 > Short self-test routine
63 > recommended polling time: ( 1) minutes.
64 > Extended self-test routine
65 > recommended polling time: ( 340) minutes.
66 > Conveyance self-test routine
67 > recommended polling time: ( 2) minutes.
68 > SCT capabilities: (0x3085) SCT Status supported.
69 >
70 > SMART Attributes Data Structure revision number: 10
71 > Vendor Specific SMART Attributes with Thresholds:
72 > ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE
73 > UPDATED WHEN_FAILED RAW_VALUE
74 > 1 Raw_Read_Error_Rate 0x000f 119 099 006 Pre-fail
75 > Always - 234421760
76
77 you can happily ignore this error rate, it is usual for it to be high
78 and htere is hardware correction for it
79
80 > 3 Spin_Up_Time 0x0003 092 092 000 Pre-fail
81 > Always - 0
82 > 4 Start_Stop_Count 0x0032 100 100 020 Old_age
83 > Always - 33
84
85 33 power cycles seem very low but further down we see the power on time
86 is just under two years which is also erring towards the lighter side of
87 the mtbf
88
89 > 5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail
90 > Always - 0
91
92 zero reallocated sectors suggests there is space to do reallocation
93
94 > 7 Seek_Error_Rate 0x000f 079 060 030 Pre-fail
95 > Always - 99909120
96 > 9 Power_On_Hours 0x0032 082 082 000 Old_age
97 > Always - 16379
98
99 almost two years of power on time
100
101 > 10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail
102 > Always - 0
103 > 12 Power_Cycle_Count 0x0032 100 100 020 Old_age
104 > Always - 34
105 > 183 Runtime_Bad_Block 0x0032 100 100 000 Old_age
106 > Always - 0
107 > 184 End-to-End_Error 0x0032 100 100 099 Old_age
108 > Always - 0
109 > 187 Reported_Uncorrect 0x0032 100 100 000 Old_age
110 > Always - 0
111 > 188 Command_Timeout 0x0032 100 100 000 Old_age
112 > Always - 0 0 0
113 > 189 High_Fly_Writes 0x003a 100 100 000 Old_age
114 > Always - 0
115 > 190 Airflow_Temperature_Cel 0x0022 069 063 045 Old_age
116 > Always - 31 (Min/Max 26/33)
117 > 191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age
118 > Always - 0
119 > 192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age
120 > Always - 9
121 > 193 Load_Cycle_Count 0x0032 093 093 000 Old_age
122 > Always - 14284
123 > 194 Temperature_Celsius 0x0022 031 040 000 Old_age
124 > Always - 31 (0 17 0 0 0)
125 > 197 Current_Pending_Sector 0x0012 100 100 000 Old_age
126 > Always - 104
127 197
128 this says there are 104 pending sectors i.e. bad blocks on the drive
129 that have not been reallocatd yet
130
131 > 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age
132 > Offline - 104
133
134 this says it was not able to reallocate. which is odd because of the
135 entry 5 being zero
136
137 > 199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age
138 > Always - 0
139 > 240 Head_Flying_Hours 0x0000 100 253 000 Old_age
140 > Offline - 15955h+37m+28.932s
141 > 241 Total_LBAs_Written 0x0000 100 253 000 Old_age
142 > Offline - 52221690631887
143 > 242 Total_LBAs_Read 0x0000 100 253 000 Old_age
144 > Offline - 74848968465606
145 >
146 > SMART Error Log Version: 1
147 > No Errors Logged
148 >
149 > SMART Self-test log structure revision number 1
150 > Num Test_Description Status Remaining
151 > LifeTime(hours) LBA_of_first_error
152 > # 1 Extended offline Completed: read failure 60%
153 > 16365 2905482560
154 > # 2 Extended offline Completed: read failure 60%
155 > 16352 2905482560
156 > # 3 Extended offline Completed without error 00%
157 > 8044 -
158 > # 4 Extended offline Completed without error 00%
159 > 3121 -
160 > # 5 Extended offline Completed without error 00%
161 > 1548 -
162 > # 6 Short offline Completed without error 00%
163 > 1141 -
164 > # 7 Extended offline Completed without error 00%
165 > 719 -
166 > # 8 Extended offline Completed without error 00%
167 > 525 -
168 > # 9 Short offline Completed without error 00%
169 > 516 -
170 > #10 Extended offline Completed without error 00%
171 > 18 -
172 > #11 Extended offline Completed without error 00%
173 > 5 -
174 > #12 Short offline Completed without error 00%
175 > 0 -
176 >
177 > SMART Selective self-test log data structure revision number 1
178 > SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
179 > 1 0 0 Not_testing
180 > 2 0 0 Not_testing
181 > 3 0 0 Not_testing
182 > 4 0 0 Not_testing
183 > 5 0 0 Not_testing
184 > Selective self-test flags (0x0):
185 > After scanning selected spans, do NOT read-scan remainder of disk.
186 > If Selective self-test is pending on power-up, resume after 0 minute delay.
187 >
188 > root@fireball / #
189 >
190 > Does that help shed any light on this situation? If you need more
191 > info, just let me know. Off to newegg. BRB
192 >
193 > Dale
194 >
195 > :-) :-)
196 >
197
198 104 bad blocks is not a sign of a happy disk.
199 i would replace urgently
200
201 also consider running smartd or a smartmonitor plugin for munin as the
202 test log suggests you last ran a test after the first year of usage

Replies

Subject Author
Re: [gentoo-user] smartctrl drive error @60% Dale <rdalek1967@×××××.com>