1 |
On Saturday, May 18, 2019 11:01:30 P.M. AEST Wols Lists wrote: |
2 |
> On 17/05/19 06:19, Andrew Udvare wrote: |
3 |
> >> On May 17, 2019, at 01:14, Adam Carter <adamcarter3@×××××.com> wrote: |
4 |
> >> |
5 |
> >> The classic one is where OPS haven't noticed that disks in a RAID array |
6 |
> >> have died years ago...> |
7 |
> > This really happened? |
8 |
> |
9 |
> It's probably more common than you think. |
10 |
> |
11 |
> Can't tell (don't really know) the details, but I was told a story first |
12 |
> hand about someone who went in to the computer room and asked "what are |
13 |
> those flashing red lights?" |
14 |
> |
15 |
> Cue massive panic as ops suddenly realised that (a) it was the main |
16 |
> billing server with terabytes of critical information and (b) the two |
17 |
> flashing lights meant their terribly expensive raid-6 disk array was now |
18 |
> running in raid-0! |
19 |
|
20 |
|
21 |
And the even bigger worry would be that a drive replacement and rebuild, which |
22 |
is the whole point of using RAID, may fail. The degraded RAID is working (so |
23 |
far) but a rebuild (unless it is *very* file system aware) needs to read EVERY |
24 |
BLOCK on the existing disks to rebuild the failed drive/s, and if it |
25 |
encounters any failed blocks in unused areas of the RAID it may be unable to |
26 |
complete the rebuild. |
27 |
|
28 |
I have seen this happen in previous positions. Not an easy thing to report to |
29 |
management, and the unexpected downtime to rebuild everything from backups |
30 |
onto new drives can be extensive (and expensive). |
31 |
|
32 |
This is why good RAID systems have a background task that regularly reads and |
33 |
checks every block of every disk, to avoid undetected errors. |
34 |
|
35 |
Hot Spares are also a good safety measure, along with monitoring software that |
36 |
alerts you when the spares have gone live. |
37 |
|
38 |
|
39 |
-- |
40 |
Reverend Paul Colquhoun, ULC. http://andor.dropbear.id.au/ |
41 |
Asking for technical help in newsgroups? Read this first: |
42 |
http://catb.org/~esr/faqs/smart-questions.html#intro |