Gentoo Archives: gentoo-amd64

From: "Gary E. Miller" <gem@××××××.com>
To: gentoo-amd64@l.g.o
Cc: markknecht@×××××.com
Subject: Re: [gentoo-amd64] Re: Is my RAID performance bad possibly due to starting sector value?
Date: Fri, 21 Jun 2013 18:50:59
Message-Id: 20130621115043.32b99d94.gem@rellim.com
In Reply to: Re: [gentoo-amd64] Re: Is my RAID performance bad possibly due to starting sector value? by Mark Knecht
1 Yo Mark!
2
3 On Fri, 21 Jun 2013 11:38:00 -0700
4 Mark Knecht <markknecht@×××××.com> wrote:
5
6 > On the read side I'm not sure if I'm understanding your point. I agree
7 > that a so-designed RAID1 system could/might read smaller portions of a
8 > larger read from RAID1 drives in parallel, taking some data from one
9 > drive and some from another drive, and then only take action
10 > corrective if one of the drives had troubles. However I don't know
11 > that mdadm-based RAID1 does anything like that. Does it?
12
13 It surely does. I have confirmed that at least monthly since md has
14 existed in the kernel.
15
16 > It seems to me that unless I at least _request_ all data from all
17 > drives and minimally compare at least some error flag from the
18 > controller telling me one drive had trouble reading a sector then how
19 > do I know if anything bad is happening?
20
21 Correct. You cant' tell if you can read something without trying to
22 read it. Which is why you should do a full raid rebuild every week.
23 >
24 > Or maybe you're saying it's RAID1 and I don't know if anything bad is
25 > happening _unless_ I do a scrub and specifically check all the drives
26 > for consistency?
27
28 No. A simple read will find the problem. But given it is RAID1 the only
29 way to be sure to read from both dirves is a raid rebuild.
30
31 > I do mdadm scrubs at least once a week. I still do them by hand. They
32 > have never appeared terribly expensive watching top or iotop but
33 > sometimes when I'm watching NetFlix or Hulu in a VM I get more pauses
34 > when the scrub is taking place, but it's not huge.
35
36 Which is why you should cron jothem at oh-dark-thirty.
37 >
38 > I agree that RAID5 gives you an opportunity to get things fixed, but
39 > there are folks who lose a disk in a RAID5, start the rebuild, and
40 > then lose a second disk during the rebuild.
41
42 Because they failed to do weekly rebuilds.
43
44 > Not that I would ever run the array degraded but that I
45 > could still tolerate a second loss while the rebuild was happening and
46 > hopefully get by.
47
48 Sadly most people make their RAID5 or RAID6 out of brand new,
49 consecutively serial numbered drives. They then get the exactly the
50 same temp, voltage, humidity, seek stress until they all fail within
51 days of each other. I have personally seen 4 of 5 drives in a RAID5
52 fail within 3 days many times. Usually on a Friday where the tech
53 decides the drive replacement can wait until Monday.
54
55 Your only protection against a full RAIDx failure is an offsite backup.
56
57 RGDS
58 GARY
59 ---------------------------------------------------------------------------
60 Gary E. Miller Rellim 109 NW Wilmington Ave., Suite E, Bend, OR 97701
61 gem@××××××.com Tel:+1(541)382-8588

Attachments

File name MIME type
signature.asc application/pgp-signature

Replies