Gentoo Archives: gentoo-amd64

From: Duncan <1i5t5.duncan@×××.net>
To: gentoo-amd64@l.g.o
Subject: [gentoo-amd64] Re: Is my RAID performance bad possibly due to starting sector value?
Date: Sat, 22 Jun 2013 10:30:31
Message-Id: pan$2d013$83032870$fb7caf22$c0cf28d0@cox.net
In Reply to: Re: [gentoo-amd64] Re: Is my RAID performance bad possibly due to starting sector value? by Rich Freeman
1 Rich Freeman posted on Fri, 21 Jun 2013 11:13:51 -0400 as excerpted:
2
3 > On Fri, Jun 21, 2013 at 10:27 AM, Duncan <1i5t5.duncan@×××.net> wrote:
4
5 >> Question: Would you use [btrfs] for raid1 yet, as I'm doing?
6 >> What about as a single-device filesystem?
7
8 > If I wanted to use raid1 I might consider using btrfs now. I think it
9 > is still a bit risky, but the established use cases have gotten a fair
10 > bit of testing now. I'd be more confident in using it with a single
11 > device.
12
13 OK, so we agree on the basic confidence level of various btrfs features.
14 I trust my own judgement a bit more now. =:^)
15
16 > To migrate today would require finding someplace to dump all
17 > the data offline and migrate the drives, as there is no in-place way to
18 > migrate multiple ext3/4 logical volumes on top of mdadm to a single
19 > btrfs on bare metal.
20
21 ... Unless you have enough unpartitioned space available still.
22
23 What I did a few years ago is buy a 1 TB USB drive I found at a good
24 deal. (It was very near the price of half-TB drives at the time, I
25 figured out later they must have gotten shipped a pallet of the wrong
26 ones for a sale on the half-TB version of the same thing, so it was a
27 single-store, get-it-while-they're-there-to-get, deal.)
28
29 That's how I was able to migrate from the raid6 I had back to raid1. I
30 had to squeeze the data/partitions a bit to get everything to fit, but it
31 did, and that's how I ended up with 4-way raid1, since it /had/ been a 4-
32 way raid6. All 300-gig drives at the time, so the TB USB had /plenty/ of
33 room. =:^)
34
35 > Without replying to anything in particular both you and Bob have
36 > mentioned the importance of multiple redundancy.
37 >
38 > Obviously risk goes down as redundancy goes up. If you protect 25
39 > drives of data with 1 drive of parity then you need 2/26 drives to fail
40 > to hose 25 drives of data.
41
42 Ouch!
43
44 > If you protect 1 drive of data with 25 drives of parity (call them
45 > mirrors or parity or whatever - they're functionally equivalent) then
46 > you need 25/26 drives to fail to lose 1 drive of data.
47
48 Almost correct.
49
50 Except that with 25/26 failed, you'd still have 1 working, which with
51 raid1/mirroring would be enough. (AFAIK that's the difference with
52 parity. Parity is generally done on a minimum of two devices with the
53 third as parity, and going down to just one isn't enough, you can lose
54 only one, or two if you have two-way-parity as with raid6. With
55 mirroring/raid1, they're all essentially identical, so one is enough to
56 keep going, you'd have to loose 26/26 to be dead in the water. But 25/26
57 dead or 26/26 dead, you better HOPE it never comes down to where that
58 matters!)
59
60 > RAID 1 is actually less effective - if you protect 13
61 > drives of data with 13 mirrors you need 2/26 drives to fail to lose 1
62 > drive of data (they just have to be the wrong 2). However, you do need
63 > to consider that RAID is not the only way to protect data, and I'm not
64 > sure that multiple-redundancy raid-1 is the most cost-effective
65 > strategy.
66
67 The first time I read that thru I read it wrong, and was about to
68 disagree. Then I realized what you meant... and that it was an equally
69 valid read of what you wrote, except...
70
71 AFAIK 13 drives of data with 13 mirrors wouldn't (normally) be called
72 raid1 (unless it's 13 individual raid1s). Normally, an arrangement of
73 that nature if configured together would be configured as raid10, 2-way-
74 mirrored, 13-way-striped (or possibly raid0+1, but that's not recommended
75 for technical reasons having to do with rebuild thruput), tho it could
76 also be configured as what mdraid calls linear mode (which isn't really
77 raid, but it happens to be handled by the same md/raid driver in Linux)
78 across the 13, plus raid1, or if they're configured as separate volumes,
79 13 individual two-disk raid1s, any of which might be what you meant (and
80 the wording appears to favor 13 individual raid1s).
81
82 What I interpreted it as initially was a 13-way raid1, mirrored again at
83 a second level to 13 additional drives, which would be called raid11,
84 except that there's no benefit of that over a simple single-layer 26-way
85 raid1 so the raid11 term is seldom seen, and that's clearly not what you
86 meant.
87
88 Anyway, you're correct if it's just two-way-mirrored. However, at that
89 level, if one was to do only two-way-mirroring, one would usually do
90 either raid10 for the 13-way striping, or 13 separate raid1s, which would
91 give one the opportunity to make some of them 3-way-mirrored (or more)
92 raid1s for the really vital data, leaving the less vital data as simple
93 2-way-mirror-raid1s.
94
95 Or raid6 and get loss-of-two tolerance, but as this whole subthread is
96 discussing, that can be problematic for thruput. (I've occasionally seen
97 reference to raid7, which is said to be 3-way-parity, loss-of-three-
98 tolerance, but AFAIK there's no support for it in the kernel, and I
99 wouldn't be surprised if all implementations are proprietary. AFAIK, in
100 practice, raid10 with N-way mirroring on the raid1 portion is implemented
101 once that many devices get involved, or other multi-level raid schemes.)
102
103 > If I had 2 drives of data to protect and had 4 spare drives to do it
104 > with, I doubt I'd set up a 3x raid-1/5/10 setup (or whatever you want to
105 > call it - imho raid "levels" are poorly named as there really is just
106 > striping and mirroring and adding RS parity and everything else is just
107 > combinations). Instead I'd probably set up a RAID1/5/10/whatever with
108 > single redundancy for faster storage and recovery, and an offline backup
109 > (compressed and with incrementals/etc). The backup gets you more
110 > security and you only need it in a very unlikely double-failure. I'd
111 > only invest in multiple redundancy in the event that the risk-weighted
112 > cost of having the node go down exceeds the cost of the extra drives.
113 > Frankly in that case RAID still isn't the right solution - you need a
114 > backup node someplace else entirely as hard drives aren't the only thing
115 > that can break in your server.
116
117 So we're talking six drives, two of data and four "spares" to play with.
118
119 Often that's setup as raid10, either two-way-striped and 3-way-mirrored,
120 or 3-way-striped and 2-way-mirrored, depending on whether the loss-of-two
121 tolerance of 3-way-mirroring or thruput of three-way-striping, is
122 considered of higher value.
123
124 You're right that at that level, you DO need a real backup, and it should
125 take priority over raid-whatever. HOWEVER, in addition to creating a
126 SINGLE raid across all those drives, it's possible to partition them up,
127 and create multiple raids out of the partitions, with one set being a
128 backup of the other. And since you've already stated that there's only
129 two drives worth of data, there's certainly room enough amongst the six
130 drives total to do just that.
131
132 This is in fact how I ran my raids, both my raid6 config, and my raid1
133 config, for a number of years, and is in fact how I have my (raid1-mode)
134 btrfs filesystems setup now on the SSDs.
135
136 Effectively I had/have each drive partitioned up into two sets of
137 partitions, my "working" set, and my "backup" set. Then I md-raided at
138 my chosen level each partition across all devices. So on each physical
139 device partition 5 might be the working rootfs partition, partition 6 the
140 woriing home partition... partition 9 the backup rootfs partition, and
141 partition 10 the backup home partition. They might end up being md3
142 (rootwork), md4 (homework), md7 (rootbak) and md8 (homebak).
143
144 That way, you're protected against physical device death by the
145 redundancy of the raids, and from fat-fingering or an update gone wrong
146 by the redundancy of the backup partitions across the same physical
147 devices.
148
149 What's nice about an arrangement such as this is that it gives you quite
150 a bit more flexibility than you'd have with a single raid, since it's now
151 possible to decide "Hmm, I don't think I actually need a backup of /var/
152 log, so I think I'll only run with one log partition/raid, instead of the
153 usual working/backup arrangement." Similarly, "You know, I ultimately
154 don't need backups of the gentoo tree and overlays, or of the kernel git
155 tree, at all, since as Linus says, 'Real men upload it to the net and let
156 others be their backup', and I can always redownload that from the net,
157 so I think I'll raid0 this partition and not keep any copies at all,
158 since re-downloading's less trouble than dealing with the backups
159 anyway." Finally, and possibly critically, it's possible to say, "You
160 know, what happens if I've just wiped rootbak in ordered to make a new
161 root backup, and I have a crash and working-root refuses to boot. I
162 think I need a rootbak2, and with the space I saved by doing only one log
163 partition and by making the sources trees raid0, I have room for it now,
164 without using any more space than I would had I had everything on the
165 same raid."
166
167 Another nice thing about it, and this is what I would have ended up doing
168 if I hadn't conveniently found that 1 TB USB drive at such a good price,
169 is that while the whole thing is partitioned up and in use, it's very
170 possible to wipe out the backup partitions temporarily, and recreate them
171 as a different raid level or a different filesystem, or otherwise
172 reorganize that area, then reboot into the new version, and do the same
173 to what was the working copies. (For the area that was raid0, well, it
174 was raid0 because it's easy to recreate, so just blow it away and
175 recreate it on the new layout. And for the single-raid log without a
176 backup copy, it's simple enough to simply point the log elsewhere or keep
177 it on rootfs for long enough to redo that set of partitions across all
178 physical devices.)
179
180 Again, this isn't just theory, it really works, as I've done it to
181 various degrees at various times, even if I found copying to the external
182 1 TB USB drive and booting from it more convenient to do when I
183 transferred from raid6 to raid1.
184
185 And being I do run ~arch, there's been a number of times I've needed to
186 boot to rootbak instead of rootworking, including once when a ~arch
187 portage was hosing symlinks just as a glibc update came along, thus
188 breaking glibc (!!), once when a bash update broke, and another time when
189 a glibc update mostly worked but I needed to downgrade and the protection
190 built into the glibc ebuild wasn't letting me do it from my working root.
191
192 What's nice about this setup in regard to booting to rootbak instead of
193 the usual working root, is that unlike booting to a liveCD/DVD rescue
194 disk, you have the full working system installed, configured and running
195 just as it was when the backup was made. That makes it much easier to
196 pickup and run from where you left off, with all the tools you're used to
197 having and modes of working you're used to using, instead of being
198 limited to some artificial rescue environment often with limited tools,
199 and in any case setup and configured differently than you have your own
200 system, because rootbak IS your own system, just from a few days/weeks/
201 months ago, whenever it was that you last did the backup.
202
203 Anyway, with the parameters you specified, two drives full of data and
204 four spare drives (presumably of a similar size), there's a LOT of
205 flexibility. There's raid10 across four drives (two-mirror, two-stripe)
206 with the other two as backup (this would probably be my choice given the
207 2-disks of data, 6 disk total, constraints, but see below, and it appears
208 this might be your choice as well), or raid6 across four drives (two
209 mirror, two parity) with two as backups (not a choice I'd likely make,
210 but a choice), or a working pair of drives plus two sets of backups (not
211 a choice I'd likely make), or raid10 across all six drives in either 3-
212 mirror/2-stripe or 3-stripe/2-mirror mode (I'd probably elect for this
213 with 3-stripe/2-mirror for the 3X speed and space, and prioritize a
214 separate backup, see the discussion below), or two independent 3-disk
215 raid5s (IMO there's better options for most cases, with the possible
216 exception of primarily slow media usage, just which options are better
217 depends on usage and priorities tho), or some hybrid combination of these.
218
219 > This sort of rationale is why I don't like arguments like "RAM is cheap"
220 > or "HDs are cheap" or whatever. The fact is that wasting money on any
221 > component means investing less in some other component that could give
222 > you more space/performance/whatever-makes-you-happy. If you have $1000
223 > that you can afford to blow on extra drives then you have $1000 you
224 > could blow on RAM, CPU, an extra server, or a trip to Disney. Why not
225 > blow it on something useful?
226
227 [ This gets philosophical. OK to quit here if uninterested. ]
228
229 You're right. "RAM and HDs are cheap"... relative to WHAT, the big-
230 screen TV/monitor I WOULD have been replacing my much smaller monitor
231 with, if I hadn't been spending the money on the "cheap" RAM and HDs?
232
233 Of course, "time is cheap" comes with the same caveats, and can actually
234 end up being far more dear. Stress and hassle of administration
235 similarly. And sometimes, just a bit of investment in another
236 "expensive" HD, saves you quite a bit of "cheap" time and stress, that's
237 actually more expensive.
238
239 "It's all relative"... to one's individual priorities. Because one
240 thing's for sure, both money and time are fungible, and if they aren't
241 spent on one thing, they WILL be on another (even if that "spent" is
242 savings, for money), and ultimately, it's one's individual priorities
243 that should rank where that spending goes. And I can't set your
244 priorities and you can't set mine, so... But from my observation, a LOT
245 of folks don't realize that and/or don't take the time necessary to
246 reevaluate their own priorities from time to time, so end up spending out
247 of line with their real priorities, and end up rather unhappy people as a
248 result! That's one reason why I have a personal policy to deliberately
249 reevaluate personal priorities from time to time (as well as being aware
250 of them constantly), and rearrange spending, money time and otherwise, in
251 accordance with those reranked priorities. I'm absolutely positive I'm a
252 happier man for doing so! =:^)
253
254 --
255 Duncan - List replies preferred. No HTML msgs.
256 "Every nonfree program has a lord, a master --
257 and if you use the program, he is your master." Richard Stallman

Replies