Re: [gentoo-user] snapshots? - gentoo-user - Gentoo Mailing List Archives

From:	Rich Freeman <rich0@g.o>
To:	gentoo-user@l.g.o
Subject:	Re: [gentoo-user] snapshots?
Date:	Tue, 05 Jan 2016 23:19:21
Message-Id:	`CAGfcS_kTgXVNnO5Ke=QNuVmu2wbtrqXjvC+O7QF5NkLFdCiq=g@mail.gmail.com`
In Reply to:	Re: [gentoo-user] snapshots? by lee

1

On Tue, Jan 5, 2016 at 5:16 PM, lee <lee@××××××××.de> wrote:

2

> Rich Freeman <rich0@g.o> writes:

3

>

4

>>

5

>> I would run btrfs on bare partitions and use btrfs's raid1

6

>> capabilities.  You're almost certainly going to get better

7

>> performance, and you get more data integrity features.

8

>

9

> That would require me to set up software raid with mdadm as well, for

10

> the swap partition.

11

12

Correct, if you don't want a panic if a single swap drive fails.

13

14

>

15

>> If you have a silent corruption with mdadm doing the raid1 then btrfs

16

>> will happily warn you of your problem and you're going to have a

17

>> really hard time fixing it,

18

>

19

> BTW, what do you do when you have silent corruption on a swap partition?

20

> Is that possible, or does swapping use its own checksums?

21

22

If the kernel pages in data from the good mirror, nothing happens.  If

23

the kernel pages in data from the bad mirror, then whatever data

24

happens to be there is what will get loaded and used and/or executed.

25

If you're lucky the modified data will be part of unused heap or

26

something.  If not, well, just about anything could happen.

27

28

Nothing in this scenario will check that the data is correct, except

29

for a forced scrub of the disks.  A scrub would probably detect the

30

error, but I don't think mdadm has any ability to recover it.  Your

31

best bet is probably to try to immediately reboot and save what you

32

can, or a less-risky solution assuming you don't have anything

33

critical in RAM is to just do an immediate hard reset so that there is

34

no risk of bad data getting swapped in and overwriting good data on

35

your normal filesystems.

36

37

> It's still odd.  I already have two different file systems and the

38

> overhead of one kind of software raid while I would rather stick to one

39

> file system.  With btrfs, I'd still have two different file systems ---

40

> plus mdadm and the overhead of three different kinds of software raid.

41

42

I'm not sure why you'd need two different filesystems.  Just btrfs for

43

your data.  I'm not sure where you're counting three types of software

44

raid either - you just have your swap.  And I don't think any of this

45

involves any significant overhead, other than configuration.

46

47

>

48

> How would it be so much better to triple the software raids and to still

49

> have the same number of file systems?

50

51

Well, the difference would be more data integrity insofar as hardware

52

failure goes, but certainly more risk of logical errors (IMO).

53

54

>

55

>>> When you use hardware raid, it

56

>>> can be disadvantageous compared to btrfs-raid --- and when you use it

57

>>> anyway, things are suddenly much more straightforward because everything

58

>>> is on raid to begin with.

59

>>

60

>> I'd stick with mdadm.  You're never going to run mixed

61

>> btrfs/hardware-raid on a single drive,

62

>

63

> A single disk doesn't make for a raid.

64

65

You misunderstood my statement.  If you have two drives, you can't run

66

both hardware raid and btrfs raid across them.  Hardware raid setups

67

don't generally support running across only part of a drive, and in

68

this setup you'd have to run hardware raid on part of each of two

69

single drives.

70

71

>

72

>> and the only time I'd consider

73

>> hardware raid is with a high quality raid card.  You'd still have to

74

>> convince me not to use mdadm even if I had one of those lying around.

75

>

76

> From my own experience, I can tell you that mdadm already does have

77

> significant overhead when you use a raid1 of two disks and a raid5 with

78

> three disks.  This overhead may be somewhat due to the SATA controller

79

> not being as capable as one would expect --- yet that doesn't matter

80

> because one thing you're looking at, besides reliability, is the overall

81

> performance.  And the overall performance very noticeably increased when

82

> I migrated from mdadm raids to hardware raids, with the same disks and

83

> the same hardware, except that the raid card was added.

84

85

Well, sure, the raid card probably had battery-backed cache if it was

86

decent, so linux could complete its commits to RAM and not have to

87

wait for the disks.

88

89

>

90

> And that was only 5 disks.  I also know that the performance with a ZFS

91

> mirror with two disks was disappointingly poor.  Those disks aren't

92

> exactly fast, but still.  I haven't tested yet if it changed after

93

> adding 4 mirrored disks to the pool.  And I know that the performance of

94

> another hardware raid5 with 6 disks was very good.

95

96

You're probably going to find the performance of a COW filesystem to

97

be inferior to that of an overwrite-in-place filesystem, simply

98

because the latter has to do less work.

99

100

>

101

> Thus I'm not convinced that software raid is the way to go.  I wish they

102

> would make hardware ZFS (or btrfs, if it ever becomes reliable)

103

> controllers.

104

105

I doubt it would perform any better.  What would that controller do

106

that your CPU wouldn't do?  Well, other than have battery-backed

107

cache, which would help in any circumstance.  If you stuck 5 raid

108

cards in your PC and put one drive on each card and put mdadm or ZFS

109

across all five it would almost certainly perform better because

110

you're adding battery-backed cache.

111

112

>

113

> The relevant advantage of btrfs is being able to make snapshots.  Is

114

> that worth all the (potential) trouble?  Snapshots are worthless when

115

> the file system destroys them with the rest of the data.

116

117

And that is why I wouldn't use btrfs on a production system unless the

118

use case mitigated this risk and there was benefit from the snapshots.

119

Of course you're taking on more risk using an experimental filesystem.

120

121

>>

122

>> btrfs does not support swap files at present.

123

>

124

> What happens when you try it?

125

126

No idea.  Should be easy to test in a VM.  I suspect either an error

127

or a kernel bug/panic/etc.

128

129

>

130

>> When it does you'll need to disable COW for them (using chattr)

131

>> otherwise they'll be fragmented until your system grinds to a halt.  A

132

>> swap file is about the worst case scenario for any COW filesystem -

133

>> I'm not sure how ZFS handles them.

134

>

135

> Well, then they need to make special provisions for swap files in btrfs

136

> so that we can finally get rid of the swap partitions.

137

138

I'm sure they'll happily accept patches.  :)

139

140

>

141

>> If I had done that in the past I think I would have completely avoided

142

>> that issue that required me to restore from backups.  That happened in

143

>> the 3.15/3.16 timeframe and I'd have never even run those kernels.

144

>> They were stable kernels at the time, and a few versions in when I

145

>> switched to them (I was probably just following gentoo-sources stable

146

>> keywords back then), but they still had regressions (fixes were

147

>> eventually backported).

148

>

149

> How do you know if an old kernel you pick because you think the btrfs

150

> part works well enough is the right pick?  You can either encounter a

151

> bug that has been fixed or a regression that hasn't been

152

> discovered/fixed yet.  That way, you can't win.

153

154

You read the lists closely.  If you want to be bleeding-edge it will

155

take more work than if you just go with the flow.  That's why I'm not

156

on 4.1 yet - I read the lists and am not quite sure they're ready yet.

157

158

>

159

>> I think btrfs is certainly usable today, though I'd be hesitant to run

160

>> it on production servers depending on the use case (I'd be looking for

161

>> a use case that actually has a significant benefit from using btrfs,

162

>> and which somehow mitigates the risks).

163

>

164

> There you go, it's usable, and the risk of using it is too high.

165

166

That is a judgement that everybody has to make based on their

167

requirements.  The important thing is to make an informed decision.  I

168

don't get paid if you pick btrfs.

169

170

>

171

>> Right now I keep a daily rsnapshot (rsync on steroids - it's in the

172

>> Gentoo repo) backup of my btrfs filesystems on ext4.  I occasionally

173

>> debate whether I still need it, but I sleep better knowing I have it.

174

>> This is in addition to my daily duplicity cloud backups of my most

175

>> important data (so, /etc and /home are in the cloud, and mythtv's

176

>> /var/video is just on a local rsync backup).

177

>

178

> I wouldn't give my data out of my hands.

179

180

Somehow I doubt the folks at Amazon are going to break RSA anytime soon.

181

182

>

183

> Snapper?  I've never heard of that ...

184

>

185

186

http://snapper.io/

187

188

Basically snapshots+crontab and some wrappers to set retention

189

policies and such.  That and some things like package-manager plugins

190

so that you get snapshots before you install stuff.

191

192

>

193

> Queuing up the data when there's more data than the system can deal with

194

> only works when the system has sufficient time to catch up with the

195

> queue.  Otherwise, you have to block something at some point, or you

196

> must drop the data.  At that point, it doesn't matter how you arrange

197

> the contents of the queue within it.

198

199

Absolutely true.  You need to throttle the data before it gets into

200

the queue, so that the business of the queue is exposed to the

201

applications so that they behave appropriately (falling back to

202

lower-bandwidth alternatives, etc).  In my case if mythtv's write

203

buffers are filling up and I'm also running an emerge install phase

204

the correct answer (per ionice) is for emerge to block so that my

205

realtime video capture buffers are safely flushed.  What you don't

206

want is for the kernel to let emerge dump a few GB of low-priority

207

data into the write cache alongside my 5Mbps HD recording stream.

208

Granted, it isn't as big a problem as it used to be now that RAM sizes

209

have increased.

210

211

>

212

> Gentoo /is/ fire-and-forget in that it works fine.  Btrfs is not in that

213

> it may work or not.

214

>

215

216

Well, we certainly must have come a long way then.  :)  I still

217

remember the last time the glibc ABI changed and I was basically

218

rebuilding everything from single-user mode holding my breath.

219

220

221

--

222

Rich

Gentoo Archives: gentoo-user

Replies

1	On Tue, Jan 5, 2016 at 5:16 PM, lee <lee@××××××××.de> wrote:
2	> Rich Freeman <rich0@g.o> writes:
3	>
4	>>
5	>> I would run btrfs on bare partitions and use btrfs's raid1
6	>> capabilities. You're almost certainly going to get better
7	>> performance, and you get more data integrity features.
8	>
9	> That would require me to set up software raid with mdadm as well, for
10	> the swap partition.
11
12	Correct, if you don't want a panic if a single swap drive fails.
13
14	>
15	>> If you have a silent corruption with mdadm doing the raid1 then btrfs
16	>> will happily warn you of your problem and you're going to have a
17	>> really hard time fixing it,
18	>
19	> BTW, what do you do when you have silent corruption on a swap partition?
20	> Is that possible, or does swapping use its own checksums?
21
22	If the kernel pages in data from the good mirror, nothing happens. If
23	the kernel pages in data from the bad mirror, then whatever data
24	happens to be there is what will get loaded and used and/or executed.
25	If you're lucky the modified data will be part of unused heap or
26	something. If not, well, just about anything could happen.
27
28	Nothing in this scenario will check that the data is correct, except
29	for a forced scrub of the disks. A scrub would probably detect the
30	error, but I don't think mdadm has any ability to recover it. Your
31	best bet is probably to try to immediately reboot and save what you
32	can, or a less-risky solution assuming you don't have anything
33	critical in RAM is to just do an immediate hard reset so that there is
34	no risk of bad data getting swapped in and overwriting good data on
35	your normal filesystems.
36
37	> It's still odd. I already have two different file systems and the
38	> overhead of one kind of software raid while I would rather stick to one
39	> file system. With btrfs, I'd still have two different file systems ---
40	> plus mdadm and the overhead of three different kinds of software raid.
41
42	I'm not sure why you'd need two different filesystems. Just btrfs for
43	your data. I'm not sure where you're counting three types of software
44	raid either - you just have your swap. And I don't think any of this
45	involves any significant overhead, other than configuration.
46
47	>
48	> How would it be so much better to triple the software raids and to still
49	> have the same number of file systems?
50
51	Well, the difference would be more data integrity insofar as hardware
52	failure goes, but certainly more risk of logical errors (IMO).
53
54	>
55	>>> When you use hardware raid, it
56	>>> can be disadvantageous compared to btrfs-raid --- and when you use it
57	>>> anyway, things are suddenly much more straightforward because everything
58	>>> is on raid to begin with.
59	>>
60	>> I'd stick with mdadm. You're never going to run mixed
61	>> btrfs/hardware-raid on a single drive,
62	>
63	> A single disk doesn't make for a raid.
64
65	You misunderstood my statement. If you have two drives, you can't run
66	both hardware raid and btrfs raid across them. Hardware raid setups
67	don't generally support running across only part of a drive, and in
68	this setup you'd have to run hardware raid on part of each of two
69	single drives.
70
71	>
72	>> and the only time I'd consider
73	>> hardware raid is with a high quality raid card. You'd still have to
74	>> convince me not to use mdadm even if I had one of those lying around.
75	>
76	> From my own experience, I can tell you that mdadm already does have
77	> significant overhead when you use a raid1 of two disks and a raid5 with
78	> three disks. This overhead may be somewhat due to the SATA controller
79	> not being as capable as one would expect --- yet that doesn't matter
80	> because one thing you're looking at, besides reliability, is the overall
81	> performance. And the overall performance very noticeably increased when
82	> I migrated from mdadm raids to hardware raids, with the same disks and
83	> the same hardware, except that the raid card was added.
84
85	Well, sure, the raid card probably had battery-backed cache if it was
86	decent, so linux could complete its commits to RAM and not have to
87	wait for the disks.
88
89	>
90	> And that was only 5 disks. I also know that the performance with a ZFS
91	> mirror with two disks was disappointingly poor. Those disks aren't
92	> exactly fast, but still. I haven't tested yet if it changed after
93	> adding 4 mirrored disks to the pool. And I know that the performance of
94	> another hardware raid5 with 6 disks was very good.
95
96	You're probably going to find the performance of a COW filesystem to
97	be inferior to that of an overwrite-in-place filesystem, simply
98	because the latter has to do less work.
99
100	>
101	> Thus I'm not convinced that software raid is the way to go. I wish they
102	> would make hardware ZFS (or btrfs, if it ever becomes reliable)
103	> controllers.
104
105	I doubt it would perform any better. What would that controller do
106	that your CPU wouldn't do? Well, other than have battery-backed
107	cache, which would help in any circumstance. If you stuck 5 raid
108	cards in your PC and put one drive on each card and put mdadm or ZFS
109	across all five it would almost certainly perform better because
110	you're adding battery-backed cache.
111
112	>
113	> The relevant advantage of btrfs is being able to make snapshots. Is
114	> that worth all the (potential) trouble? Snapshots are worthless when
115	> the file system destroys them with the rest of the data.
116
117	And that is why I wouldn't use btrfs on a production system unless the
118	use case mitigated this risk and there was benefit from the snapshots.
119	Of course you're taking on more risk using an experimental filesystem.
120
121	>>
122	>> btrfs does not support swap files at present.
123	>
124	> What happens when you try it?
125
126	No idea. Should be easy to test in a VM. I suspect either an error
127	or a kernel bug/panic/etc.
128
129	>
130	>> When it does you'll need to disable COW for them (using chattr)
131	>> otherwise they'll be fragmented until your system grinds to a halt. A
132	>> swap file is about the worst case scenario for any COW filesystem -
133	>> I'm not sure how ZFS handles them.
134	>
135	> Well, then they need to make special provisions for swap files in btrfs
136	> so that we can finally get rid of the swap partitions.
137
138	I'm sure they'll happily accept patches. :)
139
140	>
141	>> If I had done that in the past I think I would have completely avoided
142	>> that issue that required me to restore from backups. That happened in
143	>> the 3.15/3.16 timeframe and I'd have never even run those kernels.
144	>> They were stable kernels at the time, and a few versions in when I
145	>> switched to them (I was probably just following gentoo-sources stable
146	>> keywords back then), but they still had regressions (fixes were
147	>> eventually backported).
148	>
149	> How do you know if an old kernel you pick because you think the btrfs
150	> part works well enough is the right pick? You can either encounter a
151	> bug that has been fixed or a regression that hasn't been
152	> discovered/fixed yet. That way, you can't win.
153
154	You read the lists closely. If you want to be bleeding-edge it will
155	take more work than if you just go with the flow. That's why I'm not
156	on 4.1 yet - I read the lists and am not quite sure they're ready yet.
157
158	>
159	>> I think btrfs is certainly usable today, though I'd be hesitant to run
160	>> it on production servers depending on the use case (I'd be looking for
161	>> a use case that actually has a significant benefit from using btrfs,
162	>> and which somehow mitigates the risks).
163	>
164	> There you go, it's usable, and the risk of using it is too high.
165
166	That is a judgement that everybody has to make based on their
167	requirements. The important thing is to make an informed decision. I
168	don't get paid if you pick btrfs.
169
170	>
171	>> Right now I keep a daily rsnapshot (rsync on steroids - it's in the
172	>> Gentoo repo) backup of my btrfs filesystems on ext4. I occasionally
173	>> debate whether I still need it, but I sleep better knowing I have it.
174	>> This is in addition to my daily duplicity cloud backups of my most
175	>> important data (so, /etc and /home are in the cloud, and mythtv's
176	>> /var/video is just on a local rsync backup).
177	>
178	> I wouldn't give my data out of my hands.
179
180	Somehow I doubt the folks at Amazon are going to break RSA anytime soon.
181
182	>
183	> Snapper? I've never heard of that ...
184	>
185
186	http://snapper.io/
187
188	Basically snapshots+crontab and some wrappers to set retention
189	policies and such. That and some things like package-manager plugins
190	so that you get snapshots before you install stuff.
191
192	>
193	> Queuing up the data when there's more data than the system can deal with
194	> only works when the system has sufficient time to catch up with the
195	> queue. Otherwise, you have to block something at some point, or you
196	> must drop the data. At that point, it doesn't matter how you arrange
197	> the contents of the queue within it.
198
199	Absolutely true. You need to throttle the data before it gets into
200	the queue, so that the business of the queue is exposed to the
201	applications so that they behave appropriately (falling back to
202	lower-bandwidth alternatives, etc). In my case if mythtv's write
203	buffers are filling up and I'm also running an emerge install phase
204	the correct answer (per ionice) is for emerge to block so that my
205	realtime video capture buffers are safely flushed. What you don't
206	want is for the kernel to let emerge dump a few GB of low-priority
207	data into the write cache alongside my 5Mbps HD recording stream.
208	Granted, it isn't as big a problem as it used to be now that RAM sizes
209	have increased.
210
211	>
212	> Gentoo /is/ fire-and-forget in that it works fine. Btrfs is not in that
213	> it may work or not.
214	>
215
216	Well, we certainly must have come a long way then. :) I still
217	remember the last time the glibc ABI changed and I was basically
218	rebuilding everything from single-user mode holding my breath.
219
220
221	--
222	Rich