1 |
On Tue, 20 Nov 2007 14:28:16 +0100 |
2 |
Florian Philipp <lists@f_philipp.fastmail.net> wrote: |
3 |
|
4 |
> > Well, the best thing to do is to post any error messages you have |
5 |
> > here. I googled mine too but still was not sure what to make of the |
6 |
> > info I was getting. It sounded bad so I posted them here. There |
7 |
> > are some serious hardware gurus here and I was sure someone that |
8 |
> > had ran into this before would clarify what was going on for me. |
9 |
> > |
10 |
> > [...] |
11 |
> > |
12 |
> > Dale |
13 |
> > |
14 |
> I recommend reading Google's analysis of SMART and HDD failures: |
15 |
> http://labs.google.com/papers/disk_failures.pdf |
16 |
|
17 |
|
18 |
the interesting part from the conclusions section: |
19 |
|
20 |
One of our key findings has been the lack of a consistent |
21 |
pattern of higher failure rates for higher temperature |
22 |
drives or for those drives at higher utilization levels. |
23 |
Such correlations have been repeatedly highlighted |
24 |
by previous studies, but we are unable to confirm them |
25 |
by observing our population. Although our data do not |
26 |
allow us to conclude that there is no such correlation, |
27 |
it provides strong evidence to suggest that other effects |
28 |
may be more prominent in affecting disk drive reliability |
29 |
in the context of a professionally managed data center |
30 |
deployment. |
31 |
|
32 |
Our results confirm the findings of previous smaller |
33 |
population studies that suggest that some of the SMART |
34 |
parameters are well-correlated with higher failure probabilities. |
35 |
We find, for example, that after their first scan |
36 |
error, drives are 39 times more likely to fail within 60 |
37 |
days than drives with no such errors. First errors in reallocations, |
38 |
offline reallocations, and probational counts |
39 |
are also strongly correlated to higher failure probabilities. |
40 |
Despite those strong correlations, we find that |
41 |
failure prediction models based on SMART parameters |
42 |
alone are likely to be severely limited in their prediction |
43 |
accuracy, given that a large fraction of our failed drives |
44 |
have shown no SMART error signals whatsoever. This |
45 |
result suggests that SMART models are more useful in |
46 |
predicting trends for large aggregate populations than for |
47 |
individual components. It also suggests that powerful |
48 |
predictive models need to make use of signals beyond |
49 |
those provided by SMART. |
50 |
-- |
51 |
gentoo-user@g.o mailing list |