Gentoo Archives: gentoo-amd64

From: Duncan <1i5t5.duncan@×××.net>
To: gentoo-amd64@l.g.o
Subject: [gentoo-amd64] Re: bad drive?
Date: Sat, 13 Jun 2009 17:56:11
Message-Id: pan.2009.06.13.17.55.58@cox.net
In Reply to: [gentoo-amd64] bad drive? by Wil Reichert
1 Wil Reichert <wil.reichert@×××××.com> posted
2 7a329d910906130944o7e5fa20eta701105f0e98624b@××××××××××.com, excerpted
3 below, on Sat, 13 Jun 2009 09:44:32 -0700:
4
5 > I think one of my drives is on its way out, tho I've never seen a drive
6 > fail like this before. Drive is a year old WD 640G & I use it as my
7 > system drive. Via SMART, I've been doing daily short & weekly long tests
8 > since I installed it. Starting last week I woke up to my keyboard
9 > lights blinking and the sound of the heads thrashing & the drive
10 > repeatedly attempting to spin up. On my desktop the mouse was still
11 > moving but any command (dmesg, less /var/log/messages) resulted in an IO
12 > error. I restarted the computer and everything came up fine.
13 > I dug through the logs but there were no IO errors of any sort to be
14 > found. All I could see was that the extended SMART test successfully
15 > started (from smartd.log):
16
17 You list the drive make, but I don't know if that's the model or not.
18 Googling turns of a number of 640 gig Western Digital models...
19
20 FWIW, while I'm having better luck again with my current Seagates (3
21 years old this summer, 4 300 gig SATA drives with most of the system in
22 RAID-6 so I could lose one... and still be able to have a second go down
23 while I was rebuilding on a replacement, without losing the system), I
24 had a bad run of two drives in a row that lasted almost exactly a year,
25 before that. Before /that/, I'd always run my drives past switching them
26 primary to secondary due to upgrade, then secondary to third drive, then
27 eventually out of rotation as too small to be practical any more or when
28 I had no room on the bus or when they failed as a third drive, so two
29 drives in a row going out in a year was BAD for me. OTOH, at least one
30 of them SEVERELY overheated (AC went dead and I came home to the 'puter
31 still trying to run in a room of ~50C, no telling what the drive was),
32 and I'm reasonably sure it'd have run much longer otherwise.
33
34 BTW, both of those drives (including the way overheated one, which simply
35 head-crashed, thus grooved up where it the head was floating at the time,
36 but was OK on other partitions including my backup partitions on the same
37 disk) still actually ran when I pulled them, but they had bad partitions
38 and I no longer felt safe running them. It's possible something like
39 that is happening to your disk too, particularly if SMART says it has
40 overheated.
41
42 Meanwhile, as I said, I don't know what your drive is, but PARTICULARLY
43 IF IT IS IDE, take a look at this recent LWN article, in particular, the
44 HPA aka host protected area bit, and the comment of "alankila" (near the
45 bottom), which your story brought to mind. It might be worth checking
46 with hdparm just to be sure, tho I really don't understand how smart's
47 own test could be screwed up by that as the drive should certainly
48 understand its own parameters even if various Linux utilities don't
49 necessarily agree.
50
51 (This is the "In Brief" feature from the June 3 LWN kernel page. As
52 such, it covers a number of topics "in brief", so it doesn't give much
53 info, but that comment's useful and it's a good place to start further
54 research if it looks useful.) http://lwn.net/Articles/335913/
55
56 But regardless, getting another drive and RAID-1-ing the pair (or four
57 drives and RAID-6-ing or RAID-10-ing them), at least for your vital
58 partitions, is I believe a pretty good idea at this point. It seems
59 drives don't last like they used to, and they are cheap enough, RAIDing
60 them is actually a reasonable solution now, especially with SATA. I know
61 I rest a LOT easier knowing I have 2-drive redundancy, here.
62
63 Or you can do what I did before, which appears to be what you had done,
64 rotate your primary drive into backup usage, and hope both the older
65 backup and the newer main drive don't go out at the same time. Of
66 course, you can then be left without good backups if that's all you use,
67 since the one's likely much smaller than the other, which used to mean
68 probably too small for all your data on both, tho with today's capacities
69 and cost for new drives, that's not quite the problem it used to be.
70
71 --
72 Duncan - List replies preferred. No HTML msgs.
73 "Every nonfree program has a lord, a master --
74 and if you use the program, he is your master." Richard Stallman