Gentoo Archives: gentoo-user

From: Dale <rdalek1967@×××××.com>
To: gentoo-user@l.g.o
Subject: Re: [gentoo-user] Hard drive error from SMART
Date: Wed, 13 Apr 2022 00:44:11
Message-Id: b0be4752-5f73-d40b-109e-74a4a9e810fd@gmail.com
In Reply to: Re: [gentoo-user] Hard drive error from SMART by Frank Steinmetzger
1 Frank Steinmetzger wrote:
2 > Am Tue, Apr 12, 2022 at 06:01:11PM -0500 schrieb Dale:
3 >
4 >>> The advantage of an integrity scheme (like ZFS or comparing with a checksum
5 >>> file) over your rsync approach is that you only need to read all the datas™
6 >>> from one drive instead of two. Plus: if rsync actually detects a change, it
7 >>> doesn’t know which of the two drives introduced the error. You need to find
8 >>> out yourself after the fact (which probably won’t be hard, but still, it’s
9 >>> one more manual step).
10 >> In this case, if something had changed, I'd have no problem manually
11 >> checking the file to be sure which was good and which was bad.
12 > Consider a big video file, which I know you like to accumulate from youtube
13 > and the likes. How do you find out the broken one? By watching it and trying
14 > to find the one image or audio frame that is garbled? The drive might return
15 > zeros or other garbage (bit flip) instead of actual content without SMART
16 > noticing it (uncorrectable error).
17 >
18
19 In this case, I'd likely rename one file and keep them both until I can
20 figure out which is good.  That said, I'd certainly keep the backup copy
21 because odds are, it is good since the error came well after my last
22 backup.  At this point tho, I don't know what file was on that bad spot. 
23
24
25 >> Given
26 >> the error is recent on my drive, I'd suspect the backups to still be a
27 >> good file.  For that reason, I'd suspect the backup file to be good
28 >> therefore not to be overwritten.  I was trying to avoid a bad file
29 >> replacing a good file on the backup which then destroys all good files
30 >> and leaves only bad ones.  This is why I like that SMART at least let me
31 >> know there is a problem. 
32 > I also tend to rely on smart, but it’s not all-knowing and probably not
33 > infallible.
34 >
35 >
36
37
38 This is very true.  I mentioned elsewhere that things like spindle motor
39 failure or the motor that moves the heads are usually not detectable. 
40 Some component failures can be detected but not all or even most from
41 what I've read.  Basically, the best you can hope for is SMART seeing a
42 bad spot on the media itself.  That it seems it can detect most of the
43 time. 
44
45 TL;DR next two paragraphs.  Just a interesting story along this line.  I
46 used to work in parts at a fortune 500 office company.  We had millions
47 of dollars of just computer stuff in inventory just for computers.  That
48 was in early 90's.  They also had copiers and their parts, paper etc
49 etc.  We used a NCR computer for a computer system for the whole
50 company.  At the end of the building was a speed bump so people wouldn't
51 go flying down the one lane road between the building and fence on the
52 property line.  One day a large truck almost empty went a little faster
53 than normal over the last speed bump.  It shook the building to the
54 point I could feel it about 150 feet away.  The computer room was like
55 50 feet away from that side of the building.  It seems the hard drive
56 felt it very well.  One, maybe more, of the head(s) got under the media
57 and started peeling it off the platter and made a really ugly screeching
58 sound.  No routine shutdown, they just pulled the plug.  As you can
59 imagine tho, it did no good.  Even way back then drives of that speed
60 were spinning fast enough.  I suspect even by the time a person could
61 blink it was way past fixing. 
62
63 That of course was way before SMART came along but SMART would never be
64 able to predict such a failure.  Even NCR said it was likely a 1 in a
65 million chance that the truck hits just when the head was moving over a
66 weak spot.  Several thousand dollars later, and a private plane bringing
67 in a new drive, the drive was replaced.  Of course, the idiot in charge
68 had no backups that were of any use.  All of them were several weeks
69 old, likely over a month.  Luckily he stayed far away from me for at
70 least a month.  Otherwise, I'd likely still be in jail, with my hands
71 around the neck of his corpse.  :-@
72
73 SMART isn't a sure thing but it can help in some cases which is better
74 than nothing at all. 
75
76 Dale
77
78 :-)  :-)