Gentoo Archives: gentoo-user

From:	Dan Farrell <dan@×××××××××.cx>
To:	gentoo-user@l.g.o
Subject:	Re: [gentoo-user] Is my hard drive sick?
Date:	Wed, 21 Nov 2007 00:34:28
Message-Id:	`20071120182841.2ad87842@pascal.spore.ath.cx`
In Reply to:	Re: [gentoo-user] Is my hard drive sick? by Florian Philipp

1	On Tue, 20 Nov 2007 14:28:16 +0100
2	Florian Philipp <lists@f_philipp.fastmail.net> wrote:
3
4	> > Well, the best thing to do is to post any error messages you have
5	> > here. I googled mine too but still was not sure what to make of the
6	> > info I was getting. It sounded bad so I posted them here. There
7	> > are some serious hardware gurus here and I was sure someone that
8	> > had ran into this before would clarify what was going on for me.
9	> >
10	> > [...]
11	> >
12	> > Dale
13	> >
14	> I recommend reading Google's analysis of SMART and HDD failures:
15	> http://labs.google.com/papers/disk_failures.pdf
16
17
18	the interesting part from the conclusions section:
19
20	One of our key findings has been the lack of a consistent
21	pattern of higher failure rates for higher temperature
22	drives or for those drives at higher utilization levels.
23	Such correlations have been repeatedly highlighted
24	by previous studies, but we are unable to confirm them
25	by observing our population. Although our data do not
26	allow us to conclude that there is no such correlation,
27	it provides strong evidence to suggest that other effects
28	may be more prominent in affecting disk drive reliability
29	in the context of a professionally managed data center
30	deployment.
31
32	Our results confirm the findings of previous smaller
33	population studies that suggest that some of the SMART
34	parameters are well-correlated with higher failure probabilities.
35	We find, for example, that after their first scan
36	error, drives are 39 times more likely to fail within 60
37	days than drives with no such errors. First errors in reallocations,
38	offline reallocations, and probational counts
39	are also strongly correlated to higher failure probabilities.
40	Despite those strong correlations, we find that
41	failure prediction models based on SMART parameters
42	alone are likely to be severely limited in their prediction
43	accuracy, given that a large fraction of our failed drives
44	have shown no SMART error signals whatsoever. This
45	result suggests that SMART models are more useful in
46	predicting trends for large aggregate populations than for
47	individual components. It also suggests that powerful
48	predictive models need to make use of signals beyond
49	those provided by SMART.
50	--
51	gentoo-user@g.o mailing list

Report Message

Find on MARC Find on Google Groups