Re: [gentoo-user] which linux RAID setup to choose? - gentoo-user

From:	antlists <antlists@××××××××××××.uk>
To:	gentoo-user@l.g.o
Subject:	Re: [gentoo-user] which linux RAID setup to choose?
Date:	Sun, 03 May 2020 23:19:48
Message-Id:	`a73409b4-aaeb-8082-20b7-9e9ea1d258eb@youngman.org.uk`
In Reply to:	Re: [gentoo-user] which linux RAID setup to choose? by Caveman Al Toraboran

1

On 03/05/2020 22:46, Caveman Al Toraboran wrote:

2

> On Sunday, May 3, 2020 6:27 PM, Jack <ostroffjh@×××××××××××××××××.net> wrote:

3

>

4

>

5

> curious.  how do people look at --layout=n2 in the

6

> storage industry?  e.g. do they ignore the

7

> optimistic case where 2 disk failures can be

8

> recovered, and only assume that it protects for 1

9

> disk failure?

10

11

You CANNOT afford to be optimistic ... Murphy's law says you will lose 

12

the wrong second disk.

13

>

14

> i see why gambling is not worth it here, but at

15

> the same time, i see no reason to ignore reality

16

> (that a 2 disk failure can be saved).

17

>

18

Don't ignore that some 2-disk failures CAN'T be saved ...

19

20

> e.g. a 4-disk RAID10 with -layout=n2 gives

21

>

22

>          1*4/10 + 2*4/10 = 1.2

23

>

24

> expected recoverable disk failures.  details are

25

> below:

26

>

27

>

28

> now, if we do a 5-disk --layout=n2, we get:

29

>

30

>      1    (1)    2    (2)    3

31

>     (3)    4    (4)    5    (5)

32

>      6    (6)    7    (7)    8

33

>     (8)    9    (9)    10   (10)

34

>      11   (11)   12   (12)   13

35

>     (13) ...

36

>

37

> obviously, there are 5 possible ways a single disk

38

> may fail, out of which all of the 5 will be

39

> recovered.

40

41

Don't forget a 4+spare layout, which *should* survive a 2-disk failure.

42

>

43

> there are nchoosek(5,2) = 10 possible ways a 2

44

> disk failure could happen, out of which 5

45

> will be recovered:

46

>

47

>

48

> so, by transforming a 4-disk RAID10 into a 5-disk

49

> one, we increase total storage capacity by a 0.5

50

> disk's worth of storage, while losing the ability

51

> to recover 0.2 disks.

52

>

53

> but if we extended the 4-disk RAID10 into a

54

> 6-disk --layout=n2, we will have:

55

>

56

>               6                  nchoosek(6,2) - 3

57

> = 1 * -----------------  +  2 * -----------------

58

>        6 + nchoosek(6,2)         6 + nchoosek(6,2)

59

>

60

> = 6/21                   +  2 * 12/15

61

>

62

> = 1.8857 expected recoverable failing disks.

63

>

64

> almost 2.  i.e. there is 80% chance of surviving a

65

> 2 disk failure.

66

>

67

> so, i wonder, is it a bad decision to go with an

68

> even number disks with a RAID10?  what is the

69

> right way to think to find an answer to this

70

> question?

71

>

72

> i guess the ultimate answer needs knowledge of

73

> these:

74

>

75

>      * F1: probability of having 1 disks fail within

76

>            the repair window.

77

>      * F2: probability of having 2 disks fail within

78

>            the repair window.

79

>      * F3: probability of having 3 disks fail within

80

>        .   the repair window.

81

>        .

82

>        .

83

>      * Fn: probability of having n disks fail within

84

>            the repair window.

85

>

86

>      * R1: probability of surviving 1 disks failure.

87

>            equals 1 with all related cases.

88

>      * R2: probability of surviving 2 disks failure.

89

>            equals 1/3 with 5-disk RAID10

90

>            equals 0.8 with a 6-disk RAID10.

91

>      * R3: probability of surviving 3 disks failure.

92

>            equals 0 with all related cases.

93

>        .

94

>        .

95

>        .

96

>      * Rn: probability of surviving n disks failure.

97

>            equals 0 with all related cases.

98

>

99

>      * L : expected cost of losing data on an array.

100

>      * D : price of a disk.

101

102

Don't forget, if you have a spare disk, the repair window is the length 

103

of time it takes to fail-over ...

104

>

105

> this way, the absolute expected cost when adopting

106

> a 6-disk RAID10 is:

107

>

108

> = 6D + F1*(1-R1)*L + F2*(1-R2)*L + F3*(1-R3)*L + ...

109

> = 6D + F1*(1-1)*L + F2*(1-0.8)*L + F3*(1-0)*L + ...

110

> = 6D + 0          + F2*(0.2)*L   + F3*(1-0)*L + ...

111

>

112

> and the absolute cost for a 5-disk RAID10 is:

113

>

114

> = 5D + F1*(1-1)*L + F2*(1-0.3333)*L + F3*(1-0)*L + ...

115

> = 5D + 0          + F2*(0.6667)*L   + F3*(1-0)*L + ...

116

>

117

> canceling identical terms, the difference cost is:

118

>

119

> 6-disk ===> 6D + 0.2*F2*L

120

> 5-disk ===> 5D + 0.6667*F2*L

121

>

122

> from here [1] we know that a 1TB disk costs

123

> $35.85, so:

124

>

125

> 6-disk ===> 6*35.85 + 0.2*F2*L

126

> 5-disk ===> 5*35.85 + 0.6667*F2*L

127

>

128

> now, at which point is a 5-disk array a better

129

> economical decision than a 6-disk one?  for

130

> simplicity, let LOL = F2*L:

131

>

132

> 5*35.85 + 0.6667 * LOL  <   6*35.85 + 0.2 * LOL

133

> 0.6667*LOL - 0.2 * LOL  <   6*35.85 - 5*35.85

134

> LOL * (0.6667 - 0.2)    <   6*35.85 - 5*35.85

135

>

136

>                              6*35.85 - 5*35.85

137

>             LOL          <   -----------------

138

>                                0.6667 - 0.2

139

>

140

>             LOL          <   76.816

141

>             F2*L         <   76.816

142

>

143

> so, a 5-disk RAID10 is better than a 6-disk RAID10

144

> only if:

145

>

146

>          F2*L  <  76.816 bucks.

147

>

148

> this site [2] says that 76% of seagate disks fail

149

> per year (:D).  and since disks fail independent

150

> of each other mostly, then, the probabilty of

151

> having 2 disks fail in a year is:

152

>

153

76% seems incredibly high. And no, disks do not fail independently of 

154

each other. If you buy a bunch of identical disks, at the same time, and 

155

stick them all in the same raid array, the chances of them all wearing 

156

out at the same time are rather higher than random chance would suggest.

157

158

Which is why, if a raid disk fails, the advice is always to replace it 

159

asap. And if possible, to recover the failed drive to try and copy that 

160

rather than hammer the rest of the raid.

161

162

Bear in mind that, it doesn't matter how many drives a raid-10 has, if 

163

you're recovering on to a new drive, the data is stored on just two of 

164

the other drives. So the chances of them failing as they get hammered 

165

are a lot higher.

166

167

That's why it makes a lot of sense to make sure you monitor the SMARTs, 

168

so you can replace any of the drives that look like failing before they 

169

actually do. And check the warranties. Expensive raid drives probably 

170

have longer warranties, so when they're out of warranty consider 

171

retiring them (they'll probably last a lot longer, but it's a judgement 

172

call).

173

174

All that said, I've been running a raid-1 mirror for a good few years, 

175

and I've not had any trouble on my Barracudas.

176

177

Cheers,

178

Wol

Gentoo Archives: gentoo-user

Replies

1	On 03/05/2020 22:46, Caveman Al Toraboran wrote:
2	> On Sunday, May 3, 2020 6:27 PM, Jack <ostroffjh@×××××××××××××××××.net> wrote:
3	>
4	>
5	> curious. how do people look at --layout=n2 in the
6	> storage industry? e.g. do they ignore the
7	> optimistic case where 2 disk failures can be
8	> recovered, and only assume that it protects for 1
9	> disk failure?
10
11	You CANNOT afford to be optimistic ... Murphy's law says you will lose
12	the wrong second disk.
13	>
14	> i see why gambling is not worth it here, but at
15	> the same time, i see no reason to ignore reality
16	> (that a 2 disk failure can be saved).
17	>
18	Don't ignore that some 2-disk failures CAN'T be saved ...
19
20	> e.g. a 4-disk RAID10 with -layout=n2 gives
21	>
22	> 14/10 + 24/10 = 1.2
23	>
24	> expected recoverable disk failures. details are
25	> below:
26	>
27	>
28	> now, if we do a 5-disk --layout=n2, we get:
29	>
30	> 1 (1) 2 (2) 3
31	> (3) 4 (4) 5 (5)
32	> 6 (6) 7 (7) 8
33	> (8) 9 (9) 10 (10)
34	> 11 (11) 12 (12) 13
35	> (13) ...
36	>
37	> obviously, there are 5 possible ways a single disk
38	> may fail, out of which all of the 5 will be
39	> recovered.
40
41	Don't forget a 4+spare layout, which should survive a 2-disk failure.
42	>
43	> there are nchoosek(5,2) = 10 possible ways a 2
44	> disk failure could happen, out of which 5
45	> will be recovered:
46	>
47	>
48	> so, by transforming a 4-disk RAID10 into a 5-disk
49	> one, we increase total storage capacity by a 0.5
50	> disk's worth of storage, while losing the ability
51	> to recover 0.2 disks.
52	>
53	> but if we extended the 4-disk RAID10 into a
54	> 6-disk --layout=n2, we will have:
55	>
56	> 6 nchoosek(6,2) - 3
57	> = 1 * ----------------- + 2 * -----------------
58	> 6 + nchoosek(6,2) 6 + nchoosek(6,2)
59	>
60	> = 6/21 + 2 * 12/15
61	>
62	> = 1.8857 expected recoverable failing disks.
63	>
64	> almost 2. i.e. there is 80% chance of surviving a
65	> 2 disk failure.
66	>
67	> so, i wonder, is it a bad decision to go with an
68	> even number disks with a RAID10? what is the
69	> right way to think to find an answer to this
70	> question?
71	>
72	> i guess the ultimate answer needs knowledge of
73	> these:
74	>
75	> * F1: probability of having 1 disks fail within
76	> the repair window.
77	> * F2: probability of having 2 disks fail within
78	> the repair window.
79	> * F3: probability of having 3 disks fail within
80	> . the repair window.
81	> .
82	> .
83	> * Fn: probability of having n disks fail within
84	> the repair window.
85	>
86	> * R1: probability of surviving 1 disks failure.
87	> equals 1 with all related cases.
88	> * R2: probability of surviving 2 disks failure.
89	> equals 1/3 with 5-disk RAID10
90	> equals 0.8 with a 6-disk RAID10.
91	> * R3: probability of surviving 3 disks failure.
92	> equals 0 with all related cases.
93	> .
94	> .
95	> .
96	> * Rn: probability of surviving n disks failure.
97	> equals 0 with all related cases.
98	>
99	> * L : expected cost of losing data on an array.
100	> * D : price of a disk.
101
102	Don't forget, if you have a spare disk, the repair window is the length
103	of time it takes to fail-over ...
104	>
105	> this way, the absolute expected cost when adopting
106	> a 6-disk RAID10 is:
107	>
108	> = 6D + F1(1-R1)L + F2(1-R2)L + F3(1-R3)L + ...
109	> = 6D + F1(1-1)L + F2(1-0.8)L + F3(1-0)L + ...
110	> = 6D + 0 + F2(0.2)L + F3(1-0)L + ...
111	>
112	> and the absolute cost for a 5-disk RAID10 is:
113	>
114	> = 5D + F1(1-1)L + F2(1-0.3333)L + F3(1-0)L + ...
115	> = 5D + 0 + F2(0.6667)L + F3(1-0)L + ...
116	>
117	> canceling identical terms, the difference cost is:
118	>
119	> 6-disk ===> 6D + 0.2F2L
120	> 5-disk ===> 5D + 0.6667F2L
121	>
122	> from here [1] we know that a 1TB disk costs
123	> $35.85, so:
124	>
125	> 6-disk ===> 635.85 + 0.2F2*L
126	> 5-disk ===> 535.85 + 0.6667F2*L
127	>
128	> now, at which point is a 5-disk array a better
129	> economical decision than a 6-disk one? for
130	> simplicity, let LOL = F2*L:
131	>
132	> 535.85 + 0.6667 LOL < 635.85 + 0.2 LOL
133	> 0.6667LOL - 0.2 LOL < 635.85 - 535.85
134	> LOL * (0.6667 - 0.2) < 635.85 - 535.85
135	>
136	> 635.85 - 535.85
137	> LOL < -----------------
138	> 0.6667 - 0.2
139	>
140	> LOL < 76.816
141	> F2*L < 76.816
142	>
143	> so, a 5-disk RAID10 is better than a 6-disk RAID10
144	> only if:
145	>
146	> F2*L < 76.816 bucks.
147	>
148	> this site [2] says that 76% of seagate disks fail
149	> per year (:D). and since disks fail independent
150	> of each other mostly, then, the probabilty of
151	> having 2 disks fail in a year is:
152	>
153	76% seems incredibly high. And no, disks do not fail independently of
154	each other. If you buy a bunch of identical disks, at the same time, and
155	stick them all in the same raid array, the chances of them all wearing
156	out at the same time are rather higher than random chance would suggest.
157
158	Which is why, if a raid disk fails, the advice is always to replace it
159	asap. And if possible, to recover the failed drive to try and copy that
160	rather than hammer the rest of the raid.
161
162	Bear in mind that, it doesn't matter how many drives a raid-10 has, if
163	you're recovering on to a new drive, the data is stored on just two of
164	the other drives. So the chances of them failing as they get hammered
165	are a lot higher.
166
167	That's why it makes a lot of sense to make sure you monitor the SMARTs,
168	so you can replace any of the drives that look like failing before they
169	actually do. And check the warranties. Expensive raid drives probably
170	have longer warranties, so when they're out of warranty consider
171	retiring them (they'll probably last a lot longer, but it's a judgement
172	call).
173
174	All that said, I've been running a raid-1 mirror for a good few years,
175	and I've not had any trouble on my Barracudas.
176
177	Cheers,
178	Wol