1 |
Rich Freeman posted on Fri, 21 Jun 2013 11:13:51 -0400 as excerpted: |
2 |
|
3 |
> On Fri, Jun 21, 2013 at 10:27 AM, Duncan <1i5t5.duncan@×××.net> wrote: |
4 |
|
5 |
>> Question: Would you use [btrfs] for raid1 yet, as I'm doing? |
6 |
>> What about as a single-device filesystem? |
7 |
|
8 |
> If I wanted to use raid1 I might consider using btrfs now. I think it |
9 |
> is still a bit risky, but the established use cases have gotten a fair |
10 |
> bit of testing now. I'd be more confident in using it with a single |
11 |
> device. |
12 |
|
13 |
OK, so we agree on the basic confidence level of various btrfs features. |
14 |
I trust my own judgement a bit more now. =:^) |
15 |
|
16 |
> To migrate today would require finding someplace to dump all |
17 |
> the data offline and migrate the drives, as there is no in-place way to |
18 |
> migrate multiple ext3/4 logical volumes on top of mdadm to a single |
19 |
> btrfs on bare metal. |
20 |
|
21 |
... Unless you have enough unpartitioned space available still. |
22 |
|
23 |
What I did a few years ago is buy a 1 TB USB drive I found at a good |
24 |
deal. (It was very near the price of half-TB drives at the time, I |
25 |
figured out later they must have gotten shipped a pallet of the wrong |
26 |
ones for a sale on the half-TB version of the same thing, so it was a |
27 |
single-store, get-it-while-they're-there-to-get, deal.) |
28 |
|
29 |
That's how I was able to migrate from the raid6 I had back to raid1. I |
30 |
had to squeeze the data/partitions a bit to get everything to fit, but it |
31 |
did, and that's how I ended up with 4-way raid1, since it /had/ been a 4- |
32 |
way raid6. All 300-gig drives at the time, so the TB USB had /plenty/ of |
33 |
room. =:^) |
34 |
|
35 |
> Without replying to anything in particular both you and Bob have |
36 |
> mentioned the importance of multiple redundancy. |
37 |
> |
38 |
> Obviously risk goes down as redundancy goes up. If you protect 25 |
39 |
> drives of data with 1 drive of parity then you need 2/26 drives to fail |
40 |
> to hose 25 drives of data. |
41 |
|
42 |
Ouch! |
43 |
|
44 |
> If you protect 1 drive of data with 25 drives of parity (call them |
45 |
> mirrors or parity or whatever - they're functionally equivalent) then |
46 |
> you need 25/26 drives to fail to lose 1 drive of data. |
47 |
|
48 |
Almost correct. |
49 |
|
50 |
Except that with 25/26 failed, you'd still have 1 working, which with |
51 |
raid1/mirroring would be enough. (AFAIK that's the difference with |
52 |
parity. Parity is generally done on a minimum of two devices with the |
53 |
third as parity, and going down to just one isn't enough, you can lose |
54 |
only one, or two if you have two-way-parity as with raid6. With |
55 |
mirroring/raid1, they're all essentially identical, so one is enough to |
56 |
keep going, you'd have to loose 26/26 to be dead in the water. But 25/26 |
57 |
dead or 26/26 dead, you better HOPE it never comes down to where that |
58 |
matters!) |
59 |
|
60 |
> RAID 1 is actually less effective - if you protect 13 |
61 |
> drives of data with 13 mirrors you need 2/26 drives to fail to lose 1 |
62 |
> drive of data (they just have to be the wrong 2). However, you do need |
63 |
> to consider that RAID is not the only way to protect data, and I'm not |
64 |
> sure that multiple-redundancy raid-1 is the most cost-effective |
65 |
> strategy. |
66 |
|
67 |
The first time I read that thru I read it wrong, and was about to |
68 |
disagree. Then I realized what you meant... and that it was an equally |
69 |
valid read of what you wrote, except... |
70 |
|
71 |
AFAIK 13 drives of data with 13 mirrors wouldn't (normally) be called |
72 |
raid1 (unless it's 13 individual raid1s). Normally, an arrangement of |
73 |
that nature if configured together would be configured as raid10, 2-way- |
74 |
mirrored, 13-way-striped (or possibly raid0+1, but that's not recommended |
75 |
for technical reasons having to do with rebuild thruput), tho it could |
76 |
also be configured as what mdraid calls linear mode (which isn't really |
77 |
raid, but it happens to be handled by the same md/raid driver in Linux) |
78 |
across the 13, plus raid1, or if they're configured as separate volumes, |
79 |
13 individual two-disk raid1s, any of which might be what you meant (and |
80 |
the wording appears to favor 13 individual raid1s). |
81 |
|
82 |
What I interpreted it as initially was a 13-way raid1, mirrored again at |
83 |
a second level to 13 additional drives, which would be called raid11, |
84 |
except that there's no benefit of that over a simple single-layer 26-way |
85 |
raid1 so the raid11 term is seldom seen, and that's clearly not what you |
86 |
meant. |
87 |
|
88 |
Anyway, you're correct if it's just two-way-mirrored. However, at that |
89 |
level, if one was to do only two-way-mirroring, one would usually do |
90 |
either raid10 for the 13-way striping, or 13 separate raid1s, which would |
91 |
give one the opportunity to make some of them 3-way-mirrored (or more) |
92 |
raid1s for the really vital data, leaving the less vital data as simple |
93 |
2-way-mirror-raid1s. |
94 |
|
95 |
Or raid6 and get loss-of-two tolerance, but as this whole subthread is |
96 |
discussing, that can be problematic for thruput. (I've occasionally seen |
97 |
reference to raid7, which is said to be 3-way-parity, loss-of-three- |
98 |
tolerance, but AFAIK there's no support for it in the kernel, and I |
99 |
wouldn't be surprised if all implementations are proprietary. AFAIK, in |
100 |
practice, raid10 with N-way mirroring on the raid1 portion is implemented |
101 |
once that many devices get involved, or other multi-level raid schemes.) |
102 |
|
103 |
> If I had 2 drives of data to protect and had 4 spare drives to do it |
104 |
> with, I doubt I'd set up a 3x raid-1/5/10 setup (or whatever you want to |
105 |
> call it - imho raid "levels" are poorly named as there really is just |
106 |
> striping and mirroring and adding RS parity and everything else is just |
107 |
> combinations). Instead I'd probably set up a RAID1/5/10/whatever with |
108 |
> single redundancy for faster storage and recovery, and an offline backup |
109 |
> (compressed and with incrementals/etc). The backup gets you more |
110 |
> security and you only need it in a very unlikely double-failure. I'd |
111 |
> only invest in multiple redundancy in the event that the risk-weighted |
112 |
> cost of having the node go down exceeds the cost of the extra drives. |
113 |
> Frankly in that case RAID still isn't the right solution - you need a |
114 |
> backup node someplace else entirely as hard drives aren't the only thing |
115 |
> that can break in your server. |
116 |
|
117 |
So we're talking six drives, two of data and four "spares" to play with. |
118 |
|
119 |
Often that's setup as raid10, either two-way-striped and 3-way-mirrored, |
120 |
or 3-way-striped and 2-way-mirrored, depending on whether the loss-of-two |
121 |
tolerance of 3-way-mirroring or thruput of three-way-striping, is |
122 |
considered of higher value. |
123 |
|
124 |
You're right that at that level, you DO need a real backup, and it should |
125 |
take priority over raid-whatever. HOWEVER, in addition to creating a |
126 |
SINGLE raid across all those drives, it's possible to partition them up, |
127 |
and create multiple raids out of the partitions, with one set being a |
128 |
backup of the other. And since you've already stated that there's only |
129 |
two drives worth of data, there's certainly room enough amongst the six |
130 |
drives total to do just that. |
131 |
|
132 |
This is in fact how I ran my raids, both my raid6 config, and my raid1 |
133 |
config, for a number of years, and is in fact how I have my (raid1-mode) |
134 |
btrfs filesystems setup now on the SSDs. |
135 |
|
136 |
Effectively I had/have each drive partitioned up into two sets of |
137 |
partitions, my "working" set, and my "backup" set. Then I md-raided at |
138 |
my chosen level each partition across all devices. So on each physical |
139 |
device partition 5 might be the working rootfs partition, partition 6 the |
140 |
woriing home partition... partition 9 the backup rootfs partition, and |
141 |
partition 10 the backup home partition. They might end up being md3 |
142 |
(rootwork), md4 (homework), md7 (rootbak) and md8 (homebak). |
143 |
|
144 |
That way, you're protected against physical device death by the |
145 |
redundancy of the raids, and from fat-fingering or an update gone wrong |
146 |
by the redundancy of the backup partitions across the same physical |
147 |
devices. |
148 |
|
149 |
What's nice about an arrangement such as this is that it gives you quite |
150 |
a bit more flexibility than you'd have with a single raid, since it's now |
151 |
possible to decide "Hmm, I don't think I actually need a backup of /var/ |
152 |
log, so I think I'll only run with one log partition/raid, instead of the |
153 |
usual working/backup arrangement." Similarly, "You know, I ultimately |
154 |
don't need backups of the gentoo tree and overlays, or of the kernel git |
155 |
tree, at all, since as Linus says, 'Real men upload it to the net and let |
156 |
others be their backup', and I can always redownload that from the net, |
157 |
so I think I'll raid0 this partition and not keep any copies at all, |
158 |
since re-downloading's less trouble than dealing with the backups |
159 |
anyway." Finally, and possibly critically, it's possible to say, "You |
160 |
know, what happens if I've just wiped rootbak in ordered to make a new |
161 |
root backup, and I have a crash and working-root refuses to boot. I |
162 |
think I need a rootbak2, and with the space I saved by doing only one log |
163 |
partition and by making the sources trees raid0, I have room for it now, |
164 |
without using any more space than I would had I had everything on the |
165 |
same raid." |
166 |
|
167 |
Another nice thing about it, and this is what I would have ended up doing |
168 |
if I hadn't conveniently found that 1 TB USB drive at such a good price, |
169 |
is that while the whole thing is partitioned up and in use, it's very |
170 |
possible to wipe out the backup partitions temporarily, and recreate them |
171 |
as a different raid level or a different filesystem, or otherwise |
172 |
reorganize that area, then reboot into the new version, and do the same |
173 |
to what was the working copies. (For the area that was raid0, well, it |
174 |
was raid0 because it's easy to recreate, so just blow it away and |
175 |
recreate it on the new layout. And for the single-raid log without a |
176 |
backup copy, it's simple enough to simply point the log elsewhere or keep |
177 |
it on rootfs for long enough to redo that set of partitions across all |
178 |
physical devices.) |
179 |
|
180 |
Again, this isn't just theory, it really works, as I've done it to |
181 |
various degrees at various times, even if I found copying to the external |
182 |
1 TB USB drive and booting from it more convenient to do when I |
183 |
transferred from raid6 to raid1. |
184 |
|
185 |
And being I do run ~arch, there's been a number of times I've needed to |
186 |
boot to rootbak instead of rootworking, including once when a ~arch |
187 |
portage was hosing symlinks just as a glibc update came along, thus |
188 |
breaking glibc (!!), once when a bash update broke, and another time when |
189 |
a glibc update mostly worked but I needed to downgrade and the protection |
190 |
built into the glibc ebuild wasn't letting me do it from my working root. |
191 |
|
192 |
What's nice about this setup in regard to booting to rootbak instead of |
193 |
the usual working root, is that unlike booting to a liveCD/DVD rescue |
194 |
disk, you have the full working system installed, configured and running |
195 |
just as it was when the backup was made. That makes it much easier to |
196 |
pickup and run from where you left off, with all the tools you're used to |
197 |
having and modes of working you're used to using, instead of being |
198 |
limited to some artificial rescue environment often with limited tools, |
199 |
and in any case setup and configured differently than you have your own |
200 |
system, because rootbak IS your own system, just from a few days/weeks/ |
201 |
months ago, whenever it was that you last did the backup. |
202 |
|
203 |
Anyway, with the parameters you specified, two drives full of data and |
204 |
four spare drives (presumably of a similar size), there's a LOT of |
205 |
flexibility. There's raid10 across four drives (two-mirror, two-stripe) |
206 |
with the other two as backup (this would probably be my choice given the |
207 |
2-disks of data, 6 disk total, constraints, but see below, and it appears |
208 |
this might be your choice as well), or raid6 across four drives (two |
209 |
mirror, two parity) with two as backups (not a choice I'd likely make, |
210 |
but a choice), or a working pair of drives plus two sets of backups (not |
211 |
a choice I'd likely make), or raid10 across all six drives in either 3- |
212 |
mirror/2-stripe or 3-stripe/2-mirror mode (I'd probably elect for this |
213 |
with 3-stripe/2-mirror for the 3X speed and space, and prioritize a |
214 |
separate backup, see the discussion below), or two independent 3-disk |
215 |
raid5s (IMO there's better options for most cases, with the possible |
216 |
exception of primarily slow media usage, just which options are better |
217 |
depends on usage and priorities tho), or some hybrid combination of these. |
218 |
|
219 |
> This sort of rationale is why I don't like arguments like "RAM is cheap" |
220 |
> or "HDs are cheap" or whatever. The fact is that wasting money on any |
221 |
> component means investing less in some other component that could give |
222 |
> you more space/performance/whatever-makes-you-happy. If you have $1000 |
223 |
> that you can afford to blow on extra drives then you have $1000 you |
224 |
> could blow on RAM, CPU, an extra server, or a trip to Disney. Why not |
225 |
> blow it on something useful? |
226 |
|
227 |
[ This gets philosophical. OK to quit here if uninterested. ] |
228 |
|
229 |
You're right. "RAM and HDs are cheap"... relative to WHAT, the big- |
230 |
screen TV/monitor I WOULD have been replacing my much smaller monitor |
231 |
with, if I hadn't been spending the money on the "cheap" RAM and HDs? |
232 |
|
233 |
Of course, "time is cheap" comes with the same caveats, and can actually |
234 |
end up being far more dear. Stress and hassle of administration |
235 |
similarly. And sometimes, just a bit of investment in another |
236 |
"expensive" HD, saves you quite a bit of "cheap" time and stress, that's |
237 |
actually more expensive. |
238 |
|
239 |
"It's all relative"... to one's individual priorities. Because one |
240 |
thing's for sure, both money and time are fungible, and if they aren't |
241 |
spent on one thing, they WILL be on another (even if that "spent" is |
242 |
savings, for money), and ultimately, it's one's individual priorities |
243 |
that should rank where that spending goes. And I can't set your |
244 |
priorities and you can't set mine, so... But from my observation, a LOT |
245 |
of folks don't realize that and/or don't take the time necessary to |
246 |
reevaluate their own priorities from time to time, so end up spending out |
247 |
of line with their real priorities, and end up rather unhappy people as a |
248 |
result! That's one reason why I have a personal policy to deliberately |
249 |
reevaluate personal priorities from time to time (as well as being aware |
250 |
of them constantly), and rearrange spending, money time and otherwise, in |
251 |
accordance with those reranked priorities. I'm absolutely positive I'm a |
252 |
happier man for doing so! =:^) |
253 |
|
254 |
-- |
255 |
Duncan - List replies preferred. No HTML msgs. |
256 |
"Every nonfree program has a lord, a master -- |
257 |
and if you use the program, he is your master." Richard Stallman |