1 |
Wil Reichert <wil.reichert@×××××.com> posted |
2 |
7a329d910906130944o7e5fa20eta701105f0e98624b@××××××××××.com, excerpted |
3 |
below, on Sat, 13 Jun 2009 09:44:32 -0700: |
4 |
|
5 |
> I think one of my drives is on its way out, tho I've never seen a drive |
6 |
> fail like this before. Drive is a year old WD 640G & I use it as my |
7 |
> system drive. Via SMART, I've been doing daily short & weekly long tests |
8 |
> since I installed it. Starting last week I woke up to my keyboard |
9 |
> lights blinking and the sound of the heads thrashing & the drive |
10 |
> repeatedly attempting to spin up. On my desktop the mouse was still |
11 |
> moving but any command (dmesg, less /var/log/messages) resulted in an IO |
12 |
> error. I restarted the computer and everything came up fine. |
13 |
> I dug through the logs but there were no IO errors of any sort to be |
14 |
> found. All I could see was that the extended SMART test successfully |
15 |
> started (from smartd.log): |
16 |
|
17 |
You list the drive make, but I don't know if that's the model or not. |
18 |
Googling turns of a number of 640 gig Western Digital models... |
19 |
|
20 |
FWIW, while I'm having better luck again with my current Seagates (3 |
21 |
years old this summer, 4 300 gig SATA drives with most of the system in |
22 |
RAID-6 so I could lose one... and still be able to have a second go down |
23 |
while I was rebuilding on a replacement, without losing the system), I |
24 |
had a bad run of two drives in a row that lasted almost exactly a year, |
25 |
before that. Before /that/, I'd always run my drives past switching them |
26 |
primary to secondary due to upgrade, then secondary to third drive, then |
27 |
eventually out of rotation as too small to be practical any more or when |
28 |
I had no room on the bus or when they failed as a third drive, so two |
29 |
drives in a row going out in a year was BAD for me. OTOH, at least one |
30 |
of them SEVERELY overheated (AC went dead and I came home to the 'puter |
31 |
still trying to run in a room of ~50C, no telling what the drive was), |
32 |
and I'm reasonably sure it'd have run much longer otherwise. |
33 |
|
34 |
BTW, both of those drives (including the way overheated one, which simply |
35 |
head-crashed, thus grooved up where it the head was floating at the time, |
36 |
but was OK on other partitions including my backup partitions on the same |
37 |
disk) still actually ran when I pulled them, but they had bad partitions |
38 |
and I no longer felt safe running them. It's possible something like |
39 |
that is happening to your disk too, particularly if SMART says it has |
40 |
overheated. |
41 |
|
42 |
Meanwhile, as I said, I don't know what your drive is, but PARTICULARLY |
43 |
IF IT IS IDE, take a look at this recent LWN article, in particular, the |
44 |
HPA aka host protected area bit, and the comment of "alankila" (near the |
45 |
bottom), which your story brought to mind. It might be worth checking |
46 |
with hdparm just to be sure, tho I really don't understand how smart's |
47 |
own test could be screwed up by that as the drive should certainly |
48 |
understand its own parameters even if various Linux utilities don't |
49 |
necessarily agree. |
50 |
|
51 |
(This is the "In Brief" feature from the June 3 LWN kernel page. As |
52 |
such, it covers a number of topics "in brief", so it doesn't give much |
53 |
info, but that comment's useful and it's a good place to start further |
54 |
research if it looks useful.) http://lwn.net/Articles/335913/ |
55 |
|
56 |
But regardless, getting another drive and RAID-1-ing the pair (or four |
57 |
drives and RAID-6-ing or RAID-10-ing them), at least for your vital |
58 |
partitions, is I believe a pretty good idea at this point. It seems |
59 |
drives don't last like they used to, and they are cheap enough, RAIDing |
60 |
them is actually a reasonable solution now, especially with SATA. I know |
61 |
I rest a LOT easier knowing I have 2-drive redundancy, here. |
62 |
|
63 |
Or you can do what I did before, which appears to be what you had done, |
64 |
rotate your primary drive into backup usage, and hope both the older |
65 |
backup and the newer main drive don't go out at the same time. Of |
66 |
course, you can then be left without good backups if that's all you use, |
67 |
since the one's likely much smaller than the other, which used to mean |
68 |
probably too small for all your data on both, tho with today's capacities |
69 |
and cost for new drives, that's not quite the problem it used to be. |
70 |
|
71 |
-- |
72 |
Duncan - List replies preferred. No HTML msgs. |
73 |
"Every nonfree program has a lord, a master -- |
74 |
and if you use the program, he is your master." Richard Stallman |