Gentoo Archives: gentoo-user

From: Paul Colquhoun <paulcol@×××××××××××××××××.au>
To: gentoo-user@l.g.o
Subject: Re: [gentoo-user] Re: New Intel CPU flaws discovered
Date: Sun, 19 May 2019 00:21:08
Message-Id: 3076240.H3fUk4Temu@bluering
In Reply to: Re: [gentoo-user] Re: New Intel CPU flaws discovered by Wols Lists
1 On Saturday, May 18, 2019 11:01:30 P.M. AEST Wols Lists wrote:
2 > On 17/05/19 06:19, Andrew Udvare wrote:
3 > >> On May 17, 2019, at 01:14, Adam Carter <adamcarter3@×××××.com> wrote:
4 > >>
5 > >> The classic one is where OPS haven't noticed that disks in a RAID array
6 > >> have died years ago...>
7 > > This really happened?
8 >
9 > It's probably more common than you think.
10 >
11 > Can't tell (don't really know) the details, but I was told a story first
12 > hand about someone who went in to the computer room and asked "what are
13 > those flashing red lights?"
14 >
15 > Cue massive panic as ops suddenly realised that (a) it was the main
16 > billing server with terabytes of critical information and (b) the two
17 > flashing lights meant their terribly expensive raid-6 disk array was now
18 > running in raid-0!
19
20
21 And the even bigger worry would be that a drive replacement and rebuild, which
22 is the whole point of using RAID, may fail. The degraded RAID is working (so
23 far) but a rebuild (unless it is *very* file system aware) needs to read EVERY
24 BLOCK on the existing disks to rebuild the failed drive/s, and if it
25 encounters any failed blocks in unused areas of the RAID it may be unable to
26 complete the rebuild.
27
28 I have seen this happen in previous positions. Not an easy thing to report to
29 management, and the unexpected downtime to rebuild everything from backups
30 onto new drives can be extensive (and expensive).
31
32 This is why good RAID systems have a background task that regularly reads and
33 checks every block of every disk, to avoid undetected errors.
34
35 Hot Spares are also a good safety measure, along with monitoring software that
36 alerts you when the spares have gone live.
37
38
39 --
40 Reverend Paul Colquhoun, ULC. http://andor.dropbear.id.au/
41 Asking for technical help in newsgroups? Read this first:
42 http://catb.org/~esr/faqs/smart-questions.html#intro