Gentoo Archives: gentoo-user

From: Dale <rdalek1967@×××××.com>
To: gentoo-user@l.g.o
Subject: Re: [gentoo-user] smartctrl drive error @60%
Date: Tue, 01 Jul 2014 07:21:41
Message-Id: 53B2617C.9060004@gmail.com
In Reply to: Re: [gentoo-user] smartctrl drive error @60% by "J. Roeleveld"
1 J. Roeleveld wrote:
2 > On Tuesday, July 01, 2014 06:52:10 AM Mick wrote:
3 >> On Sunday 29 Jun 2014 13:05:04 Rich Freeman wrote:
4 >>> On Sun, Jun 29, 2014 at 12:44 AM, Dale <rdalek1967@×××××.com> wrote:
5 >>>> What if I copied data to the drive until it was just about full. I'm
6 >>>> thinking like maybe 90 or 95% or so. If I do that and run the test
7 >>>> every few days, would it then catch a error after a few weeks or so of
8 >>>> testing? I realize no one knows with 100% certainty...
9 >>> As you already said, nobody knows with 100% certainty.
10 >>>
11 >>> In the failures I've experienced I'd expect it to start catching
12 >>> errors within a few days. However, on those drives the relocated
13 >>> sector count never increases, which suggests that the firmware never
14 >>> relocated those sectors when overwritten, which seems brain-dead to
15 >>> me.
16 >>>
17 >>> If the drive relocates the sectors, then conceivably it could go quite
18 >>> a long time until having errors, probably in an entirely different set
19 >>> of sectors.
20 >>>
21 >>> Even if it doesn't relocate, the reliability of the bad sectors could
22 >>> be high or low.
23 >>>
24 >>> Rich
25 >> What triggers a relocation? I also have a drive which shows a sector
26 >> relocation pending, but for a few days now and after some tests that showed
27 >> no errors, it won't relocate it.
28 > I think a write to that sector should force a relocation.
29 >
30 > --
31 > Joost
32 >
33 >
34
35 I think you are right Joost. I should have tried some fixes that COULD
36 be destructive to see if a) it fixes it and b) the data lives, other
37 than the bad part at least. I forgot to do that and really wasn't sure
38 how to do it either. One person posted a lot of info about it but it
39 was a bit deep for me. It would have required some reading and because
40 of health issues, I can't tackle that much at one time right now.
41
42 What I did tho. I got the new drive, rsynced the data from old drive to
43 new drive. Removed the LVM stuff from the old drive. I used dd to
44 erase the whole old drive, which took a while for 3TBs. o_O After
45 that, I ran the test. It came back fine. Check out this snippet:
46
47 SMART Self-test log structure revision number 1
48 Num Test_Description Status Remaining
49 LifeTime(hours) LBA_of_first_error
50 # 1 Short offline Completed without error 00%
51 16499 -
52 # 2 Extended offline Completed without error 00%
53 16498 -
54 # 3 Short offline Completed without error 00%
55 16475 -
56 # 4 Extended offline Completed without error 00%
57 16466 -
58 # 5 Extended offline Aborted by host 90%
59 16461 -
60 # 6 Extended offline Completed: read failure 60%
61 16451 2905482560
62 # 7 Extended offline Completed: read failure 60%
63 16432 2905482560
64 # 8 Extended offline Completed: read failure 60%
65 16427 2905482560
66 # 9 Extended offline Completed: read failure 60%
67 16394 2905482560
68 #10 Extended offline Completed: read failure 60%
69 16389 2905482560
70 #11 Short offline Completed without error 00%
71 16380 -
72 #12 Extended offline Completed: read failure 60%
73 16365 2905482560
74 #13 Extended offline Completed: read failure 60%
75 16352 2905482560
76 #14 Extended offline Completed without error 00%
77 8044 -
78 #15 Extended offline Completed without error 00%
79 3121 -
80 #16 Extended offline Completed without error 00%
81 1548 -
82 #17 Short offline Completed without error 00%
83 1141 -
84 #18 Extended offline Completed without error 00%
85 719 -
86 #19 Extended offline Completed without error 00%
87 525 -
88 #20 Short offline Completed without error 00%
89 516 -
90 #21 Extended offline Completed without error 00%
91 18 -
92 7 of 7 failed self-tests are outdated by newer successful extended
93 offline self-test # 2
94
95 Note the very last line. You can see all the failures but the last line
96 says the drive is good to go since the drive passed after the bad ones.
97 So, while I'm not holding my breath, that is what SMART says. It may
98 blow smoke and make horrible noises next week but right now, it says it
99 is OK.
100
101 In the end, it seems something has to write to that specific sector and
102 then the drive will reallocate/move/whatever so that the bad part isn't
103 used anymore. It seems dd did that but I bet there are other tools that
104 could do it without losing data other than what is in the bad spot of
105 course. That's my simple idea at least.
106
107 Hope that helps. I wish I could have done the other stuff and kept
108 notes on commands and such and then post the results. That MAY have
109 helped someone in the future. My brain ain't what it used to be. ;-)
110
111 Dale
112
113 :-) :-)