Gentoo Archives: gentoo-user

From: Alan McKinnon <alan.mckinnon@×××××.com>
To: gentoo-user@l.g.o
Subject: Re: [gentoo-user] hp H222 SAS controller
Date: Tue, 09 Jul 2013 22:33:17
Message-Id: 51DB323A.10702@gmail.com
In Reply to: Re: [gentoo-user] hp H222 SAS controller by "Stefan G. Weichinger"
1 On 08/07/2013 20:27, Stefan G. Weichinger wrote:
2 > Am 08.07.2013 17:58, schrieb Alan McKinnon:
3 >> On 08/07/2013 17:39, Paul Hartman wrote:
4 >>> On Thu, Jul 4, 2013 at 9:04 PM, Paul Hartman
5 >>> <paul.hartman+gentoo@×××××.com> wrote:
6 >>>> ST4000DM000
7 >>>
8 >>> As a side-note these two Seagate 4TB "Desktop" edition drives I bought
9 >>> already, after about than 100 hours of power-on usage, both drives
10 >>> have each encountered dozens of unreadable sectors so far. I was able
11 >>> to correct them (force reallocation) using hdparm... So it should be
12 >>> "fixed", and I'm reading that this is "normal" with newer drives and
13 >>> "don't worry about it", but I'm still coming from the time when 1 bad
14 >>> sector = red alert, replace the drive ASAP. I guess I will need to
15 >>> monitor and see if it gets worse.
16 >>>
17 >>
18 >>
19 >> Way back when in the bad old days of drives measured in 100s of megs,
20 >> you'd get a few bad sectors now and then, and would have to mark them as
21 >> faulty. This didn't bother us then much
22 >>
23 >> Nowadays we have drives that are 8,000 bigger than that so all other
24 >> things being equal we'd expect sectors to fail 8,000 time more (more
25 >> being a very fuzzy concept, and I know full well I'm using it loosely :-) )
26 >>
27 >> Our drives nowadays also have smart firmware, something we had to
28 >> introduce when CHS no longer cut it, this lead to sector failures being
29 >> somewhat "invisible" leaving us with the happy delusion that drives were
30 >> vastly reliable etc etc etc. But you know all this.
31 >>
32 >> A mere few dozen failures in the first 100 hours is a failure rate of
33 >> (Alan whips out the trust sci calculator) 4.8E-6%. Pretty damn
34 >> spectacular if you ask me and WELL within probabilities.
35 >>
36 >> There is likely nothing wrong with your drives. If they are faulty, it's
37 >> highly likely a systemic manufacturing fault of the mechanicals (servo
38 >> systems, motor bearing etc)
39 >>
40 >> You do realize that modern hard drives have for the longest time been up
41 >> there in the Top X list of Most Reliable Devices Made By Mankind Ever?
42 >
43 > Does it make sense to apply some sort of burn-in-procedure before
44 > actually formatting and using the disks? Running badblocks or something?
45 >
46 > I ask because I wait for that shiny new server and doing so might not
47 > hurt before installing gentoo. Or is that too paranoid and a waste of time?
48
49 If it makes you feel better, then by all means go through the motions
50 .
51
52 For my money, I reckon that's exactly what it is - motions and ritual. I
53 havew any anecdotal evidence to back it up, but it's fairly strong
54 anecdotal evidence:
55
56 Over the last 5 years, the team I'm in, the teams we work closely with
57 and the Storage guys have commissioned >1000 pieces of hardware and
58 probably more than 4000 drives, the vast majority from Dell. I have no
59 idea what burn-in Dell applies, if any. We've had our fair share of
60 infant mortality failures, prob ably less than 20 in 5 years. And here's
61 the kicker - every single one failed in production.
62
63 Most of that hardware, and ALL of the SANs, went through heavy
64 pre-deployment testing. Usually, this means cloning the -dev system onto
65 it and running the crap out of it for a decent length of time. Once the
66 techies were happy, install the production version and switch it on.
67
68 I conclude that the likely reason we only found failure in prod is that
69 only prod gives a decent viable test that approximates real life and dev
70 is always a mere simulation. It's not usage that kills a few drives
71 early, it's the almost random pattern of disk access that you get in
72 real life. That tends to shake out the weak links better than any test.
73
74 However, this is all anecdotal so use or discard as you see fit :-). I
75 no longer worry about data loss as we have 4 hour warranty turnaround
76 SLAs in place and company policy is to only deploy storage that is
77 guaranteed to survive loss of any one drive in an array.
78
79
80 --
81 Alan McKinnon
82 alan.mckinnon@×××××.com

Replies

Subject Author
Re: [gentoo-user] hp H222 SAS controller "Stefan G. Weichinger" <lists@×××××.at>