1 |
Yo Mark! |
2 |
|
3 |
On Fri, 21 Jun 2013 11:38:00 -0700 |
4 |
Mark Knecht <markknecht@×××××.com> wrote: |
5 |
|
6 |
> On the read side I'm not sure if I'm understanding your point. I agree |
7 |
> that a so-designed RAID1 system could/might read smaller portions of a |
8 |
> larger read from RAID1 drives in parallel, taking some data from one |
9 |
> drive and some from another drive, and then only take action |
10 |
> corrective if one of the drives had troubles. However I don't know |
11 |
> that mdadm-based RAID1 does anything like that. Does it? |
12 |
|
13 |
It surely does. I have confirmed that at least monthly since md has |
14 |
existed in the kernel. |
15 |
|
16 |
> It seems to me that unless I at least _request_ all data from all |
17 |
> drives and minimally compare at least some error flag from the |
18 |
> controller telling me one drive had trouble reading a sector then how |
19 |
> do I know if anything bad is happening? |
20 |
|
21 |
Correct. You cant' tell if you can read something without trying to |
22 |
read it. Which is why you should do a full raid rebuild every week. |
23 |
> |
24 |
> Or maybe you're saying it's RAID1 and I don't know if anything bad is |
25 |
> happening _unless_ I do a scrub and specifically check all the drives |
26 |
> for consistency? |
27 |
|
28 |
No. A simple read will find the problem. But given it is RAID1 the only |
29 |
way to be sure to read from both dirves is a raid rebuild. |
30 |
|
31 |
> I do mdadm scrubs at least once a week. I still do them by hand. They |
32 |
> have never appeared terribly expensive watching top or iotop but |
33 |
> sometimes when I'm watching NetFlix or Hulu in a VM I get more pauses |
34 |
> when the scrub is taking place, but it's not huge. |
35 |
|
36 |
Which is why you should cron jothem at oh-dark-thirty. |
37 |
> |
38 |
> I agree that RAID5 gives you an opportunity to get things fixed, but |
39 |
> there are folks who lose a disk in a RAID5, start the rebuild, and |
40 |
> then lose a second disk during the rebuild. |
41 |
|
42 |
Because they failed to do weekly rebuilds. |
43 |
|
44 |
> Not that I would ever run the array degraded but that I |
45 |
> could still tolerate a second loss while the rebuild was happening and |
46 |
> hopefully get by. |
47 |
|
48 |
Sadly most people make their RAID5 or RAID6 out of brand new, |
49 |
consecutively serial numbered drives. They then get the exactly the |
50 |
same temp, voltage, humidity, seek stress until they all fail within |
51 |
days of each other. I have personally seen 4 of 5 drives in a RAID5 |
52 |
fail within 3 days many times. Usually on a Friday where the tech |
53 |
decides the drive replacement can wait until Monday. |
54 |
|
55 |
Your only protection against a full RAIDx failure is an offsite backup. |
56 |
|
57 |
RGDS |
58 |
GARY |
59 |
--------------------------------------------------------------------------- |
60 |
Gary E. Miller Rellim 109 NW Wilmington Ave., Suite E, Bend, OR 97701 |
61 |
gem@××××××.com Tel:+1(541)382-8588 |