1 |
On Sunday, May 3, 2020 6:27 PM, Jack <ostroffjh@×××××××××××××××××.net> wrote: |
2 |
|
3 |
> Minor point - you have one duplicate line there ". f f ." which is the |
4 |
> second and last line of the second group. No effect on anything else in |
5 |
> the discussion. |
6 |
|
7 |
thanks. |
8 |
|
9 |
> Trying to help thinking about odd numbers of disks, if you are still |
10 |
> allowing only one disk to fail, then you can think about mirroring half |
11 |
> disks, so each disk has half of it mirrored to a different disk, instead |
12 |
> of drives always being mirrored in pairs. |
13 |
|
14 |
that definitely helped get me unstuck and continue |
15 |
thinking. thanks. |
16 |
|
17 |
curious. how do people look at --layout=n2 in the |
18 |
storage industry? e.g. do they ignore the |
19 |
optimistic case where 2 disk failures can be |
20 |
recovered, and only assume that it protects for 1 |
21 |
disk failure? |
22 |
|
23 |
i see why gambling is not worth it here, but at |
24 |
the same time, i see no reason to ignore reality |
25 |
(that a 2 disk failure can be saved). |
26 |
|
27 |
e.g. a 4-disk RAID10 with -layout=n2 gives |
28 |
|
29 |
1*4/10 + 2*4/10 = 1.2 |
30 |
|
31 |
expected recoverable disk failures. details are |
32 |
below: |
33 |
|
34 |
F . . . < recoverable |
35 |
. F . . < cases with |
36 |
. . F . < 1 disk |
37 |
. . . F < failure |
38 |
|
39 |
F . . F < recoverable |
40 |
. F F . < cases with |
41 |
. F . F < 2 disk |
42 |
F . F . < failures |
43 |
|
44 |
F F . . < not recoverable |
45 |
. . F F < cases with 2 disk |
46 |
< failures |
47 |
|
48 |
now, if we do a 5-disk --layout=n2, we get: |
49 |
|
50 |
1 (1) 2 (2) 3 |
51 |
(3) 4 (4) 5 (5) |
52 |
6 (6) 7 (7) 8 |
53 |
(8) 9 (9) 10 (10) |
54 |
11 (11) 12 (12) 13 |
55 |
(13) ... |
56 |
|
57 |
obviously, there are 5 possible ways a single disk |
58 |
may fail, out of which all of the 5 will be |
59 |
recovered. |
60 |
|
61 |
there are nchoosek(5,2) = 10 possible ways a 2 |
62 |
disk failure could happen, out of which 5 |
63 |
will be recovered: |
64 |
|
65 |
xxx (1) xxx (2) 3 |
66 |
xxx 4 xxx 5 (5) |
67 |
|
68 |
xxx (1) 2 xxx 3 |
69 |
xxx 4 (4) xxx (5) |
70 |
|
71 |
|
72 |
1 xxx 2 xxx 3 |
73 |
(3) xxx (4) xxx (5) |
74 |
|
75 |
1 xxx 2 (2) xxx |
76 |
(3) xxx (4) 5 xxx |
77 |
|
78 |
|
79 |
1 (1) xxx (2) xxx |
80 |
(3) 4 xxx 5 xxx |
81 |
|
82 |
so, expected recoverable disk failures for a |
83 |
5-disk RAID10 --layout=n2 is: |
84 |
|
85 |
1*5/15 + 2*5/15 = 1 |
86 |
|
87 |
so, by transforming a 4-disk RAID10 into a 5-disk |
88 |
one, we increase total storage capacity by a 0.5 |
89 |
disk's worth of storage, while losing the ability |
90 |
to recover 0.2 disks. |
91 |
|
92 |
but if we extended the 4-disk RAID10 into a |
93 |
6-disk --layout=n2, we will have: |
94 |
|
95 |
6 nchoosek(6,2) - 3 |
96 |
= 1 * ----------------- + 2 * ----------------- |
97 |
6 + nchoosek(6,2) 6 + nchoosek(6,2) |
98 |
|
99 |
= 6/21 + 2 * 12/15 |
100 |
|
101 |
= 1.8857 expected recoverable failing disks. |
102 |
|
103 |
almost 2. i.e. there is 80% chance of surviving a |
104 |
2 disk failure. |
105 |
|
106 |
so, i wonder, is it a bad decision to go with an |
107 |
even number disks with a RAID10? what is the |
108 |
right way to think to find an answer to this |
109 |
question? |
110 |
|
111 |
i guess the ultimate answer needs knowledge of |
112 |
these: |
113 |
|
114 |
* F1: probability of having 1 disks fail within |
115 |
the repair window. |
116 |
* F2: probability of having 2 disks fail within |
117 |
the repair window. |
118 |
* F3: probability of having 3 disks fail within |
119 |
. the repair window. |
120 |
. |
121 |
. |
122 |
* Fn: probability of having n disks fail within |
123 |
the repair window. |
124 |
|
125 |
* R1: probability of surviving 1 disks failure. |
126 |
equals 1 with all related cases. |
127 |
* R2: probability of surviving 2 disks failure. |
128 |
equals 1/3 with 5-disk RAID10 |
129 |
equals 0.8 with a 6-disk RAID10. |
130 |
* R3: probability of surviving 3 disks failure. |
131 |
equals 0 with all related cases. |
132 |
. |
133 |
. |
134 |
. |
135 |
* Rn: probability of surviving n disks failure. |
136 |
equals 0 with all related cases. |
137 |
|
138 |
* L : expected cost of losing data on an array. |
139 |
* D : price of a disk. |
140 |
|
141 |
this way, the absolute expected cost when adopting |
142 |
a 6-disk RAID10 is: |
143 |
|
144 |
= 6D + F1*(1-R1)*L + F2*(1-R2)*L + F3*(1-R3)*L + ... |
145 |
= 6D + F1*(1-1)*L + F2*(1-0.8)*L + F3*(1-0)*L + ... |
146 |
= 6D + 0 + F2*(0.2)*L + F3*(1-0)*L + ... |
147 |
|
148 |
and the absolute cost for a 5-disk RAID10 is: |
149 |
|
150 |
= 5D + F1*(1-1)*L + F2*(1-0.3333)*L + F3*(1-0)*L + ... |
151 |
= 5D + 0 + F2*(0.6667)*L + F3*(1-0)*L + ... |
152 |
|
153 |
canceling identical terms, the difference cost is: |
154 |
|
155 |
6-disk ===> 6D + 0.2*F2*L |
156 |
5-disk ===> 5D + 0.6667*F2*L |
157 |
|
158 |
from here [1] we know that a 1TB disk costs |
159 |
$35.85, so: |
160 |
|
161 |
6-disk ===> 6*35.85 + 0.2*F2*L |
162 |
5-disk ===> 5*35.85 + 0.6667*F2*L |
163 |
|
164 |
now, at which point is a 5-disk array a better |
165 |
economical decision than a 6-disk one? for |
166 |
simplicity, let LOL = F2*L: |
167 |
|
168 |
5*35.85 + 0.6667 * LOL < 6*35.85 + 0.2 * LOL |
169 |
0.6667*LOL - 0.2 * LOL < 6*35.85 - 5*35.85 |
170 |
LOL * (0.6667 - 0.2) < 6*35.85 - 5*35.85 |
171 |
|
172 |
6*35.85 - 5*35.85 |
173 |
LOL < ----------------- |
174 |
0.6667 - 0.2 |
175 |
|
176 |
LOL < 76.816 |
177 |
F2*L < 76.816 |
178 |
|
179 |
so, a 5-disk RAID10 is better than a 6-disk RAID10 |
180 |
only if: |
181 |
|
182 |
F2*L < 76.816 bucks. |
183 |
|
184 |
this site [2] says that 76% of seagate disks fail |
185 |
per year (:D). and since disks fail independent |
186 |
of each other mostly, then, the probabilty of |
187 |
having 2 disks fail in a year is: |
188 |
|
189 |
F2_year = 0.76*0.76 |
190 |
= 0.5776 |
191 |
|
192 |
but what is F2_week? each year has 52.1429 weeks. |
193 |
let's be generous and assume that disks fail at a |
194 |
uniform distribution across the year (e.g. suppose |
195 |
that we bought them randomlyly, and not in a |
196 |
single batch). |
197 |
|
198 |
in this case, the probability of 2 disks failing |
199 |
in the same week (suppose that our repair window |
200 |
is 1 week): |
201 |
|
202 |
52 |
203 |
F2 = 0.5776 * -------------------- |
204 |
52 + nchoosek(52, 2) |
205 |
|
206 |
= 0.5776 * 0.037736 |
207 |
= 0.021796 |
208 |
|
209 |
let's substitute a bit: |
210 |
|
211 |
F2 * L < 76.816 bucks. |
212 |
0.021796 * L < 76.816 bucks. |
213 |
L < 76.816 / 0.021796 bucks. |
214 |
L < 3524.3 bucks. |
215 |
|
216 |
so, in summary: |
217 |
|
218 |
/------------------------------------------------\ |
219 |
| a 5-disk RAID10 is better than a 6-disk RAID10 | |
220 |
| ONLY IF your data is WORTH LESS than 3,524.3 | |
221 |
| bucks. | |
222 |
\------------------------------------------------/ |
223 |
|
224 |
any thoughts? i'm a newbie. i wonder how |
225 |
industry people think? |
226 |
|
227 |
happy quarantine, |
228 |
cm |
229 |
|
230 |
------------ |
231 |
[1] https://www.amazon.com/WD-Blue-1TB-Hard-Drive/dp/B0088PUEPK/ |
232 |
[2] https://www.seagate.com/em/en/support/kb/hard-disk-drive-reliability-and-mtbf-afr-174791en/ |