Gentoo Archives: gentoo-user

From: Dale <rdalek1967@×××××.com>
To: gentoo-user@l.g.o
Subject: Re: [gentoo-user] Hard drive error from SMART
Date: Tue, 12 Apr 2022 18:22:25
Message-Id: ff726229-3a76-ade4-1dca-9834e1417be1@gmail.com
In Reply to: RE: [gentoo-user] Hard drive error from SMART by Laurence Perkins
1 Laurence Perkins wrote:
2 >> -----Original Message-----
3 >> From: Dale <rdalek1967@×××××.com>
4 >> Sent: Tuesday, April 12, 2022 10:08 AM
5 >> To: gentoo-user@l.g.o
6 >> Subject: Re: [gentoo-user] Hard drive error from SMART
7 >>
8 >> Rich Freeman wrote:
9 >>> On Mon, Apr 11, 2022 at 9:27 PM Dale <rdalek1967@×××××.com> wrote:
10 >>>> Thoughts. Replace as soon as drive arrives or wait and see?
11 >>>>
12 >>> So, first of all just about all my hard drives are in a RAID at this
13 >>> point, so I have a higher tolerance for issues.
14 >>>
15 >>> If a drive is under warranty I'll usually try to see if they will RMA
16 >>> it. More often than not they will, and in that case there is really
17 >>> no reason not to. I'll do advance shipping and replace the drive
18 >>> before sending the old one back so that I mostly have redundancy the
19 >>> whole time.
20 >>>
21 >>> If it isn't under warranty then I'll scrub it and see what happens.
22 >>> I'll of course do SMART self-tests, but usually an error like this
23 >>> won't actually clear until you overwrite the offline sector so that
24 >>> the drive can reallocate it. A RAID scrub/resilver/etc will overwrite
25 >>> the sector with the correct contents which will allow this to happen.
26 >>> (Otherwise there is no way for the drive to recover - if it knew what
27 >>> was stored there it wouldn't have an error in the first place.)
28 >>>
29 >>> If an error comes back then I'll replace the drive. My drives are
30 >>> pretty large at this point so I don't like keeping unreliable drives
31 >>> around. It just increases the risk of double failures, given that a
32 >>> large hard drive can take more than a day to replace. Write speeds
33 >>> just don't keep pace with capacities. I do have offline backups but I
34 >>> shudder at the thought of how long one of those would take to restore.
35 >>>
36 >>
37 >> Sadly, I don't have RAID here but to be honest, I really need to have it given the data and my recent luck with hard drives. Drives used to get dumped because they were just to small to use anymore. Nowadays, they seem to break in some fashion long before their usefulness ends their lives.
38 >>
39 >> I remounted the drives and did a backup. For anyone running up on this, just in case one of the files got corrupted, I used a little trick to see if I can figure out which one may be bad if any. I took my rsync commands from my little script and ran them one at a time with --dry-run added. If a file was to be updated on the backup that I hadn't changed or added, I was going to check into it before updating my backups. It could be that the backup file was still good and the file on my drive reporting problems was bad. In that case, I would determine which was good and either restore it from backups or allow it to be updated if needed. Either way, I should have a good file since the drive claims to have fixed the problem. Now let us pray. :-D
40 >>
41 >> Drive isn't under warranty. I may have to start buying new drives from dealers. Sometimes I find drives that are pulled from systems and have very few hours on them. Still, warranty may not last long. Saves a lot of money tho.
42 >>
43 >> USPS claims drive is on the way. Left a distribution point and should update again when it gets close. First said Saturday, then said Friday. I think Friday is about right but if the wind blows right, maybe Thursday.
44 >>
45 >> I hope I have another port and power cable plug for the swap out. At least now, I can unmount it and swap without a lot of rebooting. Since it's on LVM, that part is easy. Regretfully I have experience on that process. :/
46 >>
47 >> Thanks to all.
48 >>
49 >> Dale
50 >>
51 >> :-) :-)
52 >>
53 >>
54 > You can get up to 16X SATA PCI-e cards these days for pretty cheap. So as long as you have the power to run another drive or two there's not much reason not to do RAID on the important stuff. Also, the SATA protocol allows for port expanders, which are also pretty cheap.
55 >
56 > One of my favorite things about BTRFS is the data checksums. If the drive returns garbage, it turns into a read error. Also, if you can't do real RAID, but have excess space you can tell it to keep two copies of everything. Doesn't help with total drive failure, but does protect against the occasional failed sector. If you don't mind writes taking twice as long anyway.
57 >
58 > LMP
59
60
61 I looked into a card a good while back and they were pretty pricey at
62 the time.  You happen to have some search terms I can search for on
63 ebay, Amazon etc?  I know some chipsets work better on Linux out of the
64 box.  I don't need to buy one that doesn't work or only works with the
65 threat of a sledge hammer.  lol  I've also looked into that other thing,
66 SAS? or something.  It's been a while tho. 
67
68 I'm pretty good at doing backups.  I do Gentoo updates on Saturday, and
69 sometimes Sunday.  While the updates are downloading, I update my
70 backups.  It's almost like a religion for me.  I was just more cautious
71 earlier.  I suspect a file could be corrupted somewhere but wanted to be
72 sure it wasn't something important.  I have some files that if lost, I
73 may not can download again.  They don't exist.  A few I got from some
74 Govt archive that are really old but since removed, or at least I can't
75 find them anymore. 
76
77 I've given serious thought to switching to BTRFS.  Thing is, I'm still
78 trying to get LVM figured out.  Plus, LVM is well maintained and should
79 be for a good long while, plus it works for me.  Still, if I could
80 afford to have several new drives all at once, I'd certainly play with
81 it.  It could very well be better.  The one thing I wish, LVM had a GUI
82 where you could do everything from it.  During my recent rearrangement
83 of drives, I learned that you can't do a lot of things within webmin. 
84 It does some things but not everything.  Plus, you have to have a
85 running GUI to use it.  In that case, I had to unmount /home which meant
86 no KDE, so no Webmin either.  Still, that could cause trouble too.  I
87 dunno. 
88
89 Thanks.
90
91 Dale
92
93 :-)  :-)

Replies

Subject Author
RE: [gentoo-user] Hard drive error from SMART Laurence Perkins <lperkins@×××××××.net>