Re: [gentoo-user] smartctrl drive error @60% - gentoo-user

From:	Dale <rdalek1967@×××××.com>
To:	gentoo-user@l.g.o
Subject:	Re: [gentoo-user] smartctrl drive error @60%
Date:	Tue, 01 Jul 2014 07:21:41
Message-Id:	`53B2617C.9060004@gmail.com`
In Reply to:	Re: [gentoo-user] smartctrl drive error @60% by "J. Roeleveld"

1

J. Roeleveld wrote:

2

> On Tuesday, July 01, 2014 06:52:10 AM Mick wrote:

3

>> On Sunday 29 Jun 2014 13:05:04 Rich Freeman wrote:

4

>>> On Sun, Jun 29, 2014 at 12:44 AM, Dale <rdalek1967@×××××.com> wrote:

5

>>>> What if I copied data to the drive until it was just about full.  I'm

6

>>>> thinking like maybe 90 or 95% or so.  If I do that and run the test

7

>>>> every few days, would it then catch a error after a few weeks or so of

8

>>>> testing?  I realize no one knows with 100% certainty...

9

>>> As you already said, nobody knows with 100% certainty.

10

>>>

11

>>> In the failures I've experienced I'd expect it to start catching

12

>>> errors within a few days.  However, on those drives the relocated

13

>>> sector count never increases, which suggests that the firmware never

14

>>> relocated those sectors when overwritten, which seems brain-dead to

15

>>> me.

16

>>>

17

>>> If the drive relocates the sectors, then conceivably it could go quite

18

>>> a long time until having errors, probably in an entirely different set

19

>>> of sectors.

20

>>>

21

>>> Even if it doesn't relocate, the reliability of the bad sectors could

22

>>> be high or low.

23

>>>

24

>>> Rich

25

>> What triggers a relocation?  I also have a drive which shows a sector

26

>> relocation pending, but for a few days now and after some tests that showed

27

>> no errors, it won't relocate it.

28

> I think a write to that sector should force a relocation.

29

>

30

> --

31

> Joost

32

>

33

>

34

35

I think you are right Joost.  I should have tried some fixes that COULD

36

be destructive to see if a) it fixes it and b) the data lives, other

37

than the bad part at least.  I forgot to do that and really wasn't sure

38

how to do it either.  One person posted a lot of info about it but it

39

was a bit deep for me.  It would have required some reading and because

40

of health issues, I can't tackle that much at one time right now. 

41

42

What I did tho.  I got the new drive, rsynced the data from old drive to

43

new drive.  Removed the LVM stuff from the old drive.  I used dd to

44

erase the whole old drive, which took a while for 3TBs.  o_O  After

45

that, I ran the test.  It came back fine.  Check out this snippet:

46

47

SMART Self-test log structure revision number 1

48

Num  Test_Description    Status                  Remaining 

49

LifeTime(hours)  LBA_of_first_error

50

# 1  Short offline       Completed without error       00%    

51

16499         -

52

# 2  Extended offline    Completed without error       00%    

53

16498         -

54

# 3  Short offline       Completed without error       00%    

55

16475         -

56

# 4  Extended offline    Completed without error       00%    

57

16466         -

58

# 5  Extended offline    Aborted by host               90%    

59

16461         -

60

# 6  Extended offline    Completed: read failure       60%    

61

16451         2905482560

62

# 7  Extended offline    Completed: read failure       60%    

63

16432         2905482560

64

# 8  Extended offline    Completed: read failure       60%    

65

16427         2905482560

66

# 9  Extended offline    Completed: read failure       60%    

67

16394         2905482560

68

#10  Extended offline    Completed: read failure       60%    

69

16389         2905482560

70

#11  Short offline       Completed without error       00%    

71

16380         -

72

#12  Extended offline    Completed: read failure       60%    

73

16365         2905482560

74

#13  Extended offline    Completed: read failure       60%    

75

16352         2905482560

76

#14  Extended offline    Completed without error       00%     

77

8044         -

78

#15  Extended offline    Completed without error       00%     

79

3121         -

80

#16  Extended offline    Completed without error       00%     

81

1548         -

82

#17  Short offline       Completed without error       00%     

83

1141         -

84

#18  Extended offline    Completed without error       00%      

85

719         -

86

#19  Extended offline    Completed without error       00%      

87

525         -

88

#20  Short offline       Completed without error       00%      

89

516         -

90

#21  Extended offline    Completed without error       00%       

91

18         -

92

7 of 7 failed self-tests are outdated by newer successful extended

93

offline self-test # 2

94

95

Note the very last line.  You can see all the failures but the last line

96

says the drive is good to go since the drive passed after the bad ones. 

97

So, while I'm not holding my breath, that is what SMART says.  It may

98

blow smoke and make horrible noises next week but right now, it says it

99

is OK. 

100

101

In the end, it seems something has to write to that specific sector and

102

then the drive will reallocate/move/whatever so that the bad part isn't

103

used anymore.  It seems dd did that but I bet there are other tools that

104

could do it without losing data other than what is in the bad spot of

105

course.  That's my simple idea at least. 

106

107

Hope that helps.  I wish I could have done the other stuff and kept

108

notes on commands and such and then post the results.  That MAY have

109

helped someone in the future.  My brain ain't what it used to be.  ;-)

110

111

Dale

112

113

:-)  :-)

1	J. Roeleveld wrote:
2	> On Tuesday, July 01, 2014 06:52:10 AM Mick wrote:
3	>> On Sunday 29 Jun 2014 13:05:04 Rich Freeman wrote:
4	>>> On Sun, Jun 29, 2014 at 12:44 AM, Dale <rdalek1967@×××××.com> wrote:
5	>>>> What if I copied data to the drive until it was just about full. I'm
6	>>>> thinking like maybe 90 or 95% or so. If I do that and run the test
7	>>>> every few days, would it then catch a error after a few weeks or so of
8	>>>> testing? I realize no one knows with 100% certainty...
9	>>> As you already said, nobody knows with 100% certainty.
10	>>>
11	>>> In the failures I've experienced I'd expect it to start catching
12	>>> errors within a few days. However, on those drives the relocated
13	>>> sector count never increases, which suggests that the firmware never
14	>>> relocated those sectors when overwritten, which seems brain-dead to
15	>>> me.
16	>>>
17	>>> If the drive relocates the sectors, then conceivably it could go quite
18	>>> a long time until having errors, probably in an entirely different set
19	>>> of sectors.
20	>>>
21	>>> Even if it doesn't relocate, the reliability of the bad sectors could
22	>>> be high or low.
23	>>>
24	>>> Rich
25	>> What triggers a relocation? I also have a drive which shows a sector
26	>> relocation pending, but for a few days now and after some tests that showed
27	>> no errors, it won't relocate it.
28	> I think a write to that sector should force a relocation.
29	>
30	> --
31	> Joost
32	>
33	>
34
35	I think you are right Joost. I should have tried some fixes that COULD
36	be destructive to see if a) it fixes it and b) the data lives, other
37	than the bad part at least. I forgot to do that and really wasn't sure
38	how to do it either. One person posted a lot of info about it but it
39	was a bit deep for me. It would have required some reading and because
40	of health issues, I can't tackle that much at one time right now.
41
42	What I did tho. I got the new drive, rsynced the data from old drive to
43	new drive. Removed the LVM stuff from the old drive. I used dd to
44	erase the whole old drive, which took a while for 3TBs. o_O After
45	that, I ran the test. It came back fine. Check out this snippet:
46
47	SMART Self-test log structure revision number 1
48	Num Test_Description Status Remaining
49	LifeTime(hours) LBA_of_first_error
50	# 1 Short offline Completed without error 00%
51	16499 -
52	# 2 Extended offline Completed without error 00%
53	16498 -
54	# 3 Short offline Completed without error 00%
55	16475 -
56	# 4 Extended offline Completed without error 00%
57	16466 -
58	# 5 Extended offline Aborted by host 90%
59	16461 -
60	# 6 Extended offline Completed: read failure 60%
61	16451 2905482560
62	# 7 Extended offline Completed: read failure 60%
63	16432 2905482560
64	# 8 Extended offline Completed: read failure 60%
65	16427 2905482560
66	# 9 Extended offline Completed: read failure 60%
67	16394 2905482560
68	#10 Extended offline Completed: read failure 60%
69	16389 2905482560
70	#11 Short offline Completed without error 00%
71	16380 -
72	#12 Extended offline Completed: read failure 60%
73	16365 2905482560
74	#13 Extended offline Completed: read failure 60%
75	16352 2905482560
76	#14 Extended offline Completed without error 00%
77	8044 -
78	#15 Extended offline Completed without error 00%
79	3121 -
80	#16 Extended offline Completed without error 00%
81	1548 -
82	#17 Short offline Completed without error 00%
83	1141 -
84	#18 Extended offline Completed without error 00%
85	719 -
86	#19 Extended offline Completed without error 00%
87	525 -
88	#20 Short offline Completed without error 00%
89	516 -
90	#21 Extended offline Completed without error 00%
91	18 -
92	7 of 7 failed self-tests are outdated by newer successful extended
93	offline self-test # 2
94
95	Note the very last line. You can see all the failures but the last line
96	says the drive is good to go since the drive passed after the bad ones.
97	So, while I'm not holding my breath, that is what SMART says. It may
98	blow smoke and make horrible noises next week but right now, it says it
99	is OK.
100
101	In the end, it seems something has to write to that specific sector and
102	then the drive will reallocate/move/whatever so that the bad part isn't
103	used anymore. It seems dd did that but I bet there are other tools that
104	could do it without losing data other than what is in the bad spot of
105	course. That's my simple idea at least.
106
107	Hope that helps. I wish I could have done the other stuff and kept
108	notes on commands and such and then post the results. That MAY have
109	helped someone in the future. My brain ain't what it used to be. ;-)
110
111	Dale
112
113	:-) :-)

Gentoo Archives: gentoo-user