Gentoo Archives: gentoo-user

From: Caveman Al Toraboran <toraboracaveman@××××××××××.com>
To: "gentoo-user@l.g.o" <gentoo-user@l.g.o>
Subject: Re: [gentoo-user] which linux RAID setup to choose?
Date: Sun, 03 May 2020 21:46:23
Message-Id: r_Q9jvM58EU2pwZlP_Y-68RWGty_14cd-2tWbp0SkzuYCp_NjKveJ5N2u_C7-MDj_ECdnP7ITK-fEikxX-u-j9qZkc8K6zMSUerYoduMq5c=@protonmail.com
In Reply to: Re: [gentoo-user] which linux RAID setup to choose? by Jack
1 On Sunday, May 3, 2020 6:27 PM, Jack <ostroffjh@×××××××××××××××××.net> wrote:
2
3 > Minor point - you have one duplicate line there ". f  f ." which is the
4 > second and last line of the second group.  No effect on anything else in
5 > the discussion.
6
7 thanks.
8
9 > Trying to help thinking about odd numbers of disks, if you are still
10 > allowing only one disk to fail, then you can think about mirroring half
11 > disks, so each disk has half of it mirrored to a different disk, instead
12 > of drives always being mirrored in pairs.
13
14 that definitely helped get me unstuck and continue
15 thinking. thanks.
16
17 curious. how do people look at --layout=n2 in the
18 storage industry? e.g. do they ignore the
19 optimistic case where 2 disk failures can be
20 recovered, and only assume that it protects for 1
21 disk failure?
22
23 i see why gambling is not worth it here, but at
24 the same time, i see no reason to ignore reality
25 (that a 2 disk failure can be saved).
26
27 e.g. a 4-disk RAID10 with -layout=n2 gives
28
29 1*4/10 + 2*4/10 = 1.2
30
31 expected recoverable disk failures. details are
32 below:
33
34 F . . . < recoverable
35 . F . . < cases with
36 . . F . < 1 disk
37 . . . F < failure
38
39 F . . F < recoverable
40 . F F . < cases with
41 . F . F < 2 disk
42 F . F . < failures
43
44 F F . . < not recoverable
45 . . F F < cases with 2 disk
46 < failures
47
48 now, if we do a 5-disk --layout=n2, we get:
49
50 1 (1) 2 (2) 3
51 (3) 4 (4) 5 (5)
52 6 (6) 7 (7) 8
53 (8) 9 (9) 10 (10)
54 11 (11) 12 (12) 13
55 (13) ...
56
57 obviously, there are 5 possible ways a single disk
58 may fail, out of which all of the 5 will be
59 recovered.
60
61 there are nchoosek(5,2) = 10 possible ways a 2
62 disk failure could happen, out of which 5
63 will be recovered:
64
65 xxx (1) xxx (2) 3
66 xxx 4 xxx 5 (5)
67
68 xxx (1) 2 xxx 3
69 xxx 4 (4) xxx (5)
70
71
72 1 xxx 2 xxx 3
73 (3) xxx (4) xxx (5)
74
75 1 xxx 2 (2) xxx
76 (3) xxx (4) 5 xxx
77
78
79 1 (1) xxx (2) xxx
80 (3) 4 xxx 5 xxx
81
82 so, expected recoverable disk failures for a
83 5-disk RAID10 --layout=n2 is:
84
85 1*5/15 + 2*5/15 = 1
86
87 so, by transforming a 4-disk RAID10 into a 5-disk
88 one, we increase total storage capacity by a 0.5
89 disk's worth of storage, while losing the ability
90 to recover 0.2 disks.
91
92 but if we extended the 4-disk RAID10 into a
93 6-disk --layout=n2, we will have:
94
95 6 nchoosek(6,2) - 3
96 = 1 * ----------------- + 2 * -----------------
97 6 + nchoosek(6,2) 6 + nchoosek(6,2)
98
99 = 6/21 + 2 * 12/15
100
101 = 1.8857 expected recoverable failing disks.
102
103 almost 2. i.e. there is 80% chance of surviving a
104 2 disk failure.
105
106 so, i wonder, is it a bad decision to go with an
107 even number disks with a RAID10? what is the
108 right way to think to find an answer to this
109 question?
110
111 i guess the ultimate answer needs knowledge of
112 these:
113
114 * F1: probability of having 1 disks fail within
115 the repair window.
116 * F2: probability of having 2 disks fail within
117 the repair window.
118 * F3: probability of having 3 disks fail within
119 . the repair window.
120 .
121 .
122 * Fn: probability of having n disks fail within
123 the repair window.
124
125 * R1: probability of surviving 1 disks failure.
126 equals 1 with all related cases.
127 * R2: probability of surviving 2 disks failure.
128 equals 1/3 with 5-disk RAID10
129 equals 0.8 with a 6-disk RAID10.
130 * R3: probability of surviving 3 disks failure.
131 equals 0 with all related cases.
132 .
133 .
134 .
135 * Rn: probability of surviving n disks failure.
136 equals 0 with all related cases.
137
138 * L : expected cost of losing data on an array.
139 * D : price of a disk.
140
141 this way, the absolute expected cost when adopting
142 a 6-disk RAID10 is:
143
144 = 6D + F1*(1-R1)*L + F2*(1-R2)*L + F3*(1-R3)*L + ...
145 = 6D + F1*(1-1)*L + F2*(1-0.8)*L + F3*(1-0)*L + ...
146 = 6D + 0 + F2*(0.2)*L + F3*(1-0)*L + ...
147
148 and the absolute cost for a 5-disk RAID10 is:
149
150 = 5D + F1*(1-1)*L + F2*(1-0.3333)*L + F3*(1-0)*L + ...
151 = 5D + 0 + F2*(0.6667)*L + F3*(1-0)*L + ...
152
153 canceling identical terms, the difference cost is:
154
155 6-disk ===> 6D + 0.2*F2*L
156 5-disk ===> 5D + 0.6667*F2*L
157
158 from here [1] we know that a 1TB disk costs
159 $35.85, so:
160
161 6-disk ===> 6*35.85 + 0.2*F2*L
162 5-disk ===> 5*35.85 + 0.6667*F2*L
163
164 now, at which point is a 5-disk array a better
165 economical decision than a 6-disk one? for
166 simplicity, let LOL = F2*L:
167
168 5*35.85 + 0.6667 * LOL < 6*35.85 + 0.2 * LOL
169 0.6667*LOL - 0.2 * LOL < 6*35.85 - 5*35.85
170 LOL * (0.6667 - 0.2) < 6*35.85 - 5*35.85
171
172 6*35.85 - 5*35.85
173 LOL < -----------------
174 0.6667 - 0.2
175
176 LOL < 76.816
177 F2*L < 76.816
178
179 so, a 5-disk RAID10 is better than a 6-disk RAID10
180 only if:
181
182 F2*L < 76.816 bucks.
183
184 this site [2] says that 76% of seagate disks fail
185 per year (:D). and since disks fail independent
186 of each other mostly, then, the probabilty of
187 having 2 disks fail in a year is:
188
189 F2_year = 0.76*0.76
190 = 0.5776
191
192 but what is F2_week? each year has 52.1429 weeks.
193 let's be generous and assume that disks fail at a
194 uniform distribution across the year (e.g. suppose
195 that we bought them randomlyly, and not in a
196 single batch).
197
198 in this case, the probability of 2 disks failing
199 in the same week (suppose that our repair window
200 is 1 week):
201
202 52
203 F2 = 0.5776 * --------------------
204 52 + nchoosek(52, 2)
205
206 = 0.5776 * 0.037736
207 = 0.021796
208
209 let's substitute a bit:
210
211 F2 * L < 76.816 bucks.
212 0.021796 * L < 76.816 bucks.
213 L < 76.816 / 0.021796 bucks.
214 L < 3524.3 bucks.
215
216 so, in summary:
217
218 /------------------------------------------------\
219 | a 5-disk RAID10 is better than a 6-disk RAID10 |
220 | ONLY IF your data is WORTH LESS than 3,524.3 |
221 | bucks. |
222 \------------------------------------------------/
223
224 any thoughts? i'm a newbie. i wonder how
225 industry people think?
226
227 happy quarantine,
228 cm
229
230 ------------
231 [1] https://www.amazon.com/WD-Blue-1TB-Hard-Drive/dp/B0088PUEPK/
232 [2] https://www.seagate.com/em/en/support/kb/hard-disk-drive-reliability-and-mtbf-afr-174791en/

Replies

Subject Author
Re: [gentoo-user] which linux RAID setup to choose? hitachi303 <gentoo-user@××××××××××××××××.de>
Re: [gentoo-user] which linux RAID setup to choose? antlists <antlists@××××××××××××.uk>