Re: [gentoo-user] snapshots? - gentoo-user - Gentoo Mailing List Archives

From:	lee <lee@××××××××.de>
To:	gentoo-user@l.g.o
Subject:	Re: [gentoo-user] snapshots?
Date:	Tue, 05 Jan 2016 22:36:01
Message-Id:	`87ziwjr7sf.fsf@heimdali.yagibdah.de`
In Reply to:	Re: [gentoo-user] snapshots? by Rich Freeman

1

Rich Freeman <rich0@g.o> writes:

2

3

> On Fri, Jan 1, 2016 at 5:42 AM, lee <lee@××××××××.de> wrote:

4

>> "Stefan G. Weichinger" <lists@×××××.at> writes:

5

>>

6

>>> btrfs offers RAID-like redundancy as well, no mdadm involved here.

7

>>>

8

>>> The general recommendation now is to stay at level-1 for now. That fits

9

>>> your 2-disk-situation.

10

>>

11

>> Well, what shows better performance?  No btrfs-raid on hardware raid or

12

>> btrfs raid on JBOD?

13

>

14

> I would run btrfs on bare partitions and use btrfs's raid1

15

> capabilities.  You're almost certainly going to get better

16

> performance, and you get more data integrity features.

17

18

That would require me to set up software raid with mdadm as well, for

19

the swap partition.

20

21

> If you have a silent corruption with mdadm doing the raid1 then btrfs

22

> will happily warn you of your problem and you're going to have a

23

> really hard time fixing it,

24

25

BTW, what do you do when you have silent corruption on a swap partition?

26

Is that possible, or does swapping use its own checksums?

27

28

> [...]

29

>

30

>>>

31

>>> I would avoid converting and stuff.

32

>>>

33

>>> Why not try a fresh install on the new disks with btrfs?

34

>>

35

>> Why would I want to spend another year to get back to where I'm now?

36

>

37

> I wouldn't do a fresh install.  I'd just set up btrfs on the new disks

38

> and copy your data over (preserving attributes/etc).

39

40

That was the idea.

41

42

> I wouldn't do an in-place ext4->btrfs conversion.  I know that there

43

> were some regressions in that feature recently and I'm not sure where

44

> it stands right now.

45

46

That adds to the uncertainty of btrfs.

47

48

49

> [...]

50

>>

51

>> There you go, you end up with an odd setup.  I don't like /boot

52

>> partitions.  As well as swap partitions, they need to be on raid.  So

53

>> unless you use hardware raid, you end up with mdadm /and/ btrfs /and/

54

>> perhaps ext4, /and/ multiple partitions.

55

>

56

> [...]

57

> There isn't really anything painful about that setup though.

58

59

It's still odd.  I already have two different file systems and the

60

overhead of one kind of software raid while I would rather stick to one

61

file system.  With btrfs, I'd still have two different file systems ---

62

plus mdadm and the overhead of three different kinds of software raid.

63

64

How would it be so much better to triple the software raids and to still

65

have the same number of file systems?

66

67

>> When you use hardware raid, it

68

>> can be disadvantageous compared to btrfs-raid --- and when you use it

69

>> anyway, things are suddenly much more straightforward because everything

70

>> is on raid to begin with.

71

>

72

> I'd stick with mdadm.  You're never going to run mixed

73

> btrfs/hardware-raid on a single drive,

74

75

A single disk doesn't make for a raid.

76

77

> and the only time I'd consider

78

> hardware raid is with a high quality raid card.  You'd still have to

79

> convince me not to use mdadm even if I had one of those lying around.

80

81

From my own experience, I can tell you that mdadm already does have

82

significant overhead when you use a raid1 of two disks and a raid5 with

83

three disks.  This overhead may be somewhat due to the SATA controller

84

not being as capable as one would expect --- yet that doesn't matter

85

because one thing you're looking at, besides reliability, is the overall

86

performance.  And the overall performance very noticeably increased when

87

I migrated from mdadm raids to hardware raids, with the same disks and

88

the same hardware, except that the raid card was added.

89

90

And that was only 5 disks.  I also know that the performance with a ZFS

91

mirror with two disks was disappointingly poor.  Those disks aren't

92

exactly fast, but still.  I haven't tested yet if it changed after

93

adding 4 mirrored disks to the pool.  And I know that the performance of

94

another hardware raid5 with 6 disks was very good.

95

96

Thus I'm not convinced that software raid is the way to go.  I wish they

97

would make hardware ZFS (or btrfs, if it ever becomes reliable)

98

controllers.

99

100

Now consider:

101

102

103

+ candidates for hardware raid are two small disks (72GB each)

104

+ data on those is either mostly read, or temporary/cache-like

105

+ this setup works without any issues for over a year now

106

+ using btrfs would triple the software raids used

107

+ btrfs is uncertain, reliability questionable

108

+ mdadm would have to be added as another layer of complexity

109

+ the disks are SAS disks, genuinely made to be run in a hardware raid

110

+ the setup with hardware raid is straightforward and simple, the setup

111

  with btrfs is anything but

112

113

114

The relevant advantage of btrfs is being able to make snapshots.  Is

115

that worth all the (potential) trouble?  Snapshots are worthless when

116

the file system destroys them with the rest of the data.

117

118

> [...]

119

>> How's btrfs's performance when you use swap files instead of swap

120

>> partitions to avoid the need for mdadm?

121

>

122

> btrfs does not support swap files at present.

123

124

What happens when you try it?

125

126

> When it does you'll need to disable COW for them (using chattr)

127

> otherwise they'll be fragmented until your system grinds to a halt.  A

128

> swap file is about the worst case scenario for any COW filesystem -

129

> I'm not sure how ZFS handles them.

130

131

Well, then they need to make special provisions for swap files in btrfs

132

so that we can finally get rid of the swap partitions.

133

134

135

> [...]

136

>>> As mentioned here several times I am using btrfs on >6 of my systems for

137

>>> years now. And I don't look back so far.

138

>>

139

>> And has it always been reliable?

140

>>

141

>

142

> I've never had an episode that resulted in actual data loss.  I HAVE

143

> had an episode or two which resulted in downtime.

144

>

145

> When I've had btrfs issues I can generally mount the filesystem

146

> read-only just fine.  The problem was that cleanup threads were

147

> causing kernel BUGs which cause the filesystem to stop syncing (not a

148

> full panic, but when all your filesystems are effectively read-only

149

> there isn't much difference in many cases).  If I rebooted the system

150

> would BUG within a few minutes.  In one case I was able to boot from a

151

> more recent kernel on a rescue disk and fix things by just mounting

152

> the drive and letting it sit for 20min to finish cleaning things up

153

> while the disk was otherwise idle (some kind of locking issue most

154

> likely) - maybe I had to run btrfsck on it.  In the other case it was

155

> being really fussy and I ended up just restoring from a backup since

156

> that was the path of least resistance.  I could have probably

157

> eventually fixed the problem, and the drive was mountable read-only

158

> the entire time so given sufficient space I could have copied all the

159

> data over to a new filesystem with no loss at all.

160

161

That's exactly what I don't want to have to deal with.  It would defeat

162

the most important purpose of using raid.

163

164

> Things have been pretty quiet for the last six months though, and I

165

> think it is largely due to a change in strategy around kernel

166

> versions.  Right now I'm running 3.18.  I'm starting to consider a

167

> move to 4.1, but there is a backlog of btrfs fixes for stable that I'm

168

> waiting for Greg to catch up on and maybe I'll wait for a version

169

> after that to see if things settle down.  Around the time of

170

> 3.14->3.18 btrfs maturity seemed to settle in a bit, and at this point

171

> I think newer kernels are more likely to introduce regressions than

172

> fix problems.  The pace of btrfs patching seems to have increased as

173

> well in the last year (which is good in the long-term - most are

174

> bugfixes - but in the short term even bugfixes can introduce bugs).

175

> Unless I have a reason not to at this point I plan to run only

176

> longterm kernels, and move to them when they're about six months

177

> mature.

178

179

That's another thing making it difficult to use btrfs.

180

181

> If I had done that in the past I think I would have completely avoided

182

> that issue that required me to restore from backups.  That happened in

183

> the 3.15/3.16 timeframe and I'd have never even run those kernels.

184

> They were stable kernels at the time, and a few versions in when I

185

> switched to them (I was probably just following gentoo-sources stable

186

> keywords back then), but they still had regressions (fixes were

187

> eventually backported).

188

189

How do you know if an old kernel you pick because you think the btrfs

190

part works well enough is the right pick?  You can either encounter a

191

bug that has been fixed or a regression that hasn't been

192

discovered/fixed yet.  That way, you can't win.

193

194

> I think btrfs is certainly usable today, though I'd be hesitant to run

195

> it on production servers depending on the use case (I'd be looking for

196

> a use case that actually has a significant benefit from using btrfs,

197

> and which somehow mitigates the risks).

198

199

There you go, it's usable, and the risk of using it is too high.

200

201

> Right now I keep a daily rsnapshot (rsync on steroids - it's in the

202

> Gentoo repo) backup of my btrfs filesystems on ext4.  I occasionally

203

> debate whether I still need it, but I sleep better knowing I have it.

204

> This is in addition to my daily duplicity cloud backups of my most

205

> important data (so, /etc and /home are in the cloud, and mythtv's

206

> /var/video is just on a local rsync backup).

207

208

I wouldn't give my data out of my hands.

209

210

> Oh, and don't go anywhere near btrfs raid5/6 (btrfs on top of mdadm

211

> raid5/6 is fine, but you lose the data integrity features).  I

212

> wouldn't go anywhere near that for at least a year, and probably

213

> longer.

214

215

It might take another 5 or 10 years before btrfs isn't questionable

216

anymore, if it ever gets there.

217

218

> Overall I'm very happy with btrfs though.  Snapshots and reflinks are

219

> very handy - I can update containers and nfs roots after snapshotting

220

> them and it gives me a trivial rollback solution, and while I don't

221

> use snapper I do manually rotate through snapshots weekly.  If you do

222

> run snapper I'd probably avoid generating large numbers of snapshots -

223

> one of my BUG problems happened as a result of snapper deleting a few

224

> hundred snapshots at once.

225

226

Snapper?  I've never heard of that ...

227

228

> Btrfs's deferred processing of the log/btrees can cause the kinds of

229

> performance issues associated with garbage collection (or BUGs due to

230

> thundering herd problems).  I use ionice to try to prioritize my IO so

231

> that stuff like mythtv recordings will block less realtime activities,

232

> and in the past that hasn't always worked with btrfs.  The problem is

233

> that btrfs would accept too much data into its log, and then it would

234

> block all writes while it tried to catch up.  I haven't seen that as

235

> much recently, so maybe they're getting better about that.  As with

236

> any other scheduling problem it only works if you correctly block

237

> writes into the start of the pipeline (I've heard of similar problems

238

> with TCP QoS and such if you don't ensure that the bottleneck is the

239

> first router along the route - you can let in too much low-priority

240

> traffic and then at that point you're stuck dealing with it).

241

242

Queuing up the data when there's more data than the system can deal with

243

only works when the system has sufficient time to catch up with the

244

queue.  Otherwise, you have to block something at some point, or you

245

must drop the data.  At that point, it doesn't matter how you arrange

246

the contents of the queue within it.

247

248

> I'd suggest looking at the btrfs mailing list to get a survey for what

249

> people are dealing with.  Just ignore all the threads marked as

250

> patches and look at the discussion threads.

251

>

252

> If you're getting the impression that btrfs isn't quite

253

> fire-and-forget, you're getting the right impression.  Neither is

254

> Gentoo, so I wouldn't let that alone scare you off.  But, I see no

255

> reason to not give you fair warning.

256

257

Gentoo /is/ fire-and-forget in that it works fine.  Btrfs is not in that

258

it may work or not.

Gentoo Archives: gentoo-user

Replies

1	Rich Freeman <rich0@g.o> writes:
2
3	> On Fri, Jan 1, 2016 at 5:42 AM, lee <lee@××××××××.de> wrote:
4	>> "Stefan G. Weichinger" <lists@×××××.at> writes:
5	>>
6	>>> btrfs offers RAID-like redundancy as well, no mdadm involved here.
7	>>>
8	>>> The general recommendation now is to stay at level-1 for now. That fits
9	>>> your 2-disk-situation.
10	>>
11	>> Well, what shows better performance? No btrfs-raid on hardware raid or
12	>> btrfs raid on JBOD?
13	>
14	> I would run btrfs on bare partitions and use btrfs's raid1
15	> capabilities. You're almost certainly going to get better
16	> performance, and you get more data integrity features.
17
18	That would require me to set up software raid with mdadm as well, for
19	the swap partition.
20
21	> If you have a silent corruption with mdadm doing the raid1 then btrfs
22	> will happily warn you of your problem and you're going to have a
23	> really hard time fixing it,
24
25	BTW, what do you do when you have silent corruption on a swap partition?
26	Is that possible, or does swapping use its own checksums?
27
28	> [...]
29	>
30	>>>
31	>>> I would avoid converting and stuff.
32	>>>
33	>>> Why not try a fresh install on the new disks with btrfs?
34	>>
35	>> Why would I want to spend another year to get back to where I'm now?
36	>
37	> I wouldn't do a fresh install. I'd just set up btrfs on the new disks
38	> and copy your data over (preserving attributes/etc).
39
40	That was the idea.
41
42	> I wouldn't do an in-place ext4->btrfs conversion. I know that there
43	> were some regressions in that feature recently and I'm not sure where
44	> it stands right now.
45
46	That adds to the uncertainty of btrfs.
47
48
49	> [...]
50	>>
51	>> There you go, you end up with an odd setup. I don't like /boot
52	>> partitions. As well as swap partitions, they need to be on raid. So
53	>> unless you use hardware raid, you end up with mdadm /and/ btrfs /and/
54	>> perhaps ext4, /and/ multiple partitions.
55	>
56	> [...]
57	> There isn't really anything painful about that setup though.
58
59	It's still odd. I already have two different file systems and the
60	overhead of one kind of software raid while I would rather stick to one
61	file system. With btrfs, I'd still have two different file systems ---
62	plus mdadm and the overhead of three different kinds of software raid.
63
64	How would it be so much better to triple the software raids and to still
65	have the same number of file systems?
66
67	>> When you use hardware raid, it
68	>> can be disadvantageous compared to btrfs-raid --- and when you use it
69	>> anyway, things are suddenly much more straightforward because everything
70	>> is on raid to begin with.
71	>
72	> I'd stick with mdadm. You're never going to run mixed
73	> btrfs/hardware-raid on a single drive,
74
75	A single disk doesn't make for a raid.
76
77	> and the only time I'd consider
78	> hardware raid is with a high quality raid card. You'd still have to
79	> convince me not to use mdadm even if I had one of those lying around.
80
81	From my own experience, I can tell you that mdadm already does have
82	significant overhead when you use a raid1 of two disks and a raid5 with
83	three disks. This overhead may be somewhat due to the SATA controller
84	not being as capable as one would expect --- yet that doesn't matter
85	because one thing you're looking at, besides reliability, is the overall
86	performance. And the overall performance very noticeably increased when
87	I migrated from mdadm raids to hardware raids, with the same disks and
88	the same hardware, except that the raid card was added.
89
90	And that was only 5 disks. I also know that the performance with a ZFS
91	mirror with two disks was disappointingly poor. Those disks aren't
92	exactly fast, but still. I haven't tested yet if it changed after
93	adding 4 mirrored disks to the pool. And I know that the performance of
94	another hardware raid5 with 6 disks was very good.
95
96	Thus I'm not convinced that software raid is the way to go. I wish they
97	would make hardware ZFS (or btrfs, if it ever becomes reliable)
98	controllers.
99
100	Now consider:
101
102
103	+ candidates for hardware raid are two small disks (72GB each)
104	+ data on those is either mostly read, or temporary/cache-like
105	+ this setup works without any issues for over a year now
106	+ using btrfs would triple the software raids used
107	+ btrfs is uncertain, reliability questionable
108	+ mdadm would have to be added as another layer of complexity
109	+ the disks are SAS disks, genuinely made to be run in a hardware raid
110	+ the setup with hardware raid is straightforward and simple, the setup
111	with btrfs is anything but
112
113
114	The relevant advantage of btrfs is being able to make snapshots. Is
115	that worth all the (potential) trouble? Snapshots are worthless when
116	the file system destroys them with the rest of the data.
117
118	> [...]
119	>> How's btrfs's performance when you use swap files instead of swap
120	>> partitions to avoid the need for mdadm?
121	>
122	> btrfs does not support swap files at present.
123
124	What happens when you try it?
125
126	> When it does you'll need to disable COW for them (using chattr)
127	> otherwise they'll be fragmented until your system grinds to a halt. A
128	> swap file is about the worst case scenario for any COW filesystem -
129	> I'm not sure how ZFS handles them.
130
131	Well, then they need to make special provisions for swap files in btrfs
132	so that we can finally get rid of the swap partitions.
133
134
135	> [...]
136	>>> As mentioned here several times I am using btrfs on >6 of my systems for
137	>>> years now. And I don't look back so far.
138	>>
139	>> And has it always been reliable?
140	>>
141	>
142	> I've never had an episode that resulted in actual data loss. I HAVE
143	> had an episode or two which resulted in downtime.
144	>
145	> When I've had btrfs issues I can generally mount the filesystem
146	> read-only just fine. The problem was that cleanup threads were
147	> causing kernel BUGs which cause the filesystem to stop syncing (not a
148	> full panic, but when all your filesystems are effectively read-only
149	> there isn't much difference in many cases). If I rebooted the system
150	> would BUG within a few minutes. In one case I was able to boot from a
151	> more recent kernel on a rescue disk and fix things by just mounting
152	> the drive and letting it sit for 20min to finish cleaning things up
153	> while the disk was otherwise idle (some kind of locking issue most
154	> likely) - maybe I had to run btrfsck on it. In the other case it was
155	> being really fussy and I ended up just restoring from a backup since
156	> that was the path of least resistance. I could have probably
157	> eventually fixed the problem, and the drive was mountable read-only
158	> the entire time so given sufficient space I could have copied all the
159	> data over to a new filesystem with no loss at all.
160
161	That's exactly what I don't want to have to deal with. It would defeat
162	the most important purpose of using raid.
163
164	> Things have been pretty quiet for the last six months though, and I
165	> think it is largely due to a change in strategy around kernel
166	> versions. Right now I'm running 3.18. I'm starting to consider a
167	> move to 4.1, but there is a backlog of btrfs fixes for stable that I'm
168	> waiting for Greg to catch up on and maybe I'll wait for a version
169	> after that to see if things settle down. Around the time of
170	> 3.14->3.18 btrfs maturity seemed to settle in a bit, and at this point
171	> I think newer kernels are more likely to introduce regressions than
172	> fix problems. The pace of btrfs patching seems to have increased as
173	> well in the last year (which is good in the long-term - most are
174	> bugfixes - but in the short term even bugfixes can introduce bugs).
175	> Unless I have a reason not to at this point I plan to run only
176	> longterm kernels, and move to them when they're about six months
177	> mature.
178
179	That's another thing making it difficult to use btrfs.
180
181	> If I had done that in the past I think I would have completely avoided
182	> that issue that required me to restore from backups. That happened in
183	> the 3.15/3.16 timeframe and I'd have never even run those kernels.
184	> They were stable kernels at the time, and a few versions in when I
185	> switched to them (I was probably just following gentoo-sources stable
186	> keywords back then), but they still had regressions (fixes were
187	> eventually backported).
188
189	How do you know if an old kernel you pick because you think the btrfs
190	part works well enough is the right pick? You can either encounter a
191	bug that has been fixed or a regression that hasn't been
192	discovered/fixed yet. That way, you can't win.
193
194	> I think btrfs is certainly usable today, though I'd be hesitant to run
195	> it on production servers depending on the use case (I'd be looking for
196	> a use case that actually has a significant benefit from using btrfs,
197	> and which somehow mitigates the risks).
198
199	There you go, it's usable, and the risk of using it is too high.
200
201	> Right now I keep a daily rsnapshot (rsync on steroids - it's in the
202	> Gentoo repo) backup of my btrfs filesystems on ext4. I occasionally
203	> debate whether I still need it, but I sleep better knowing I have it.
204	> This is in addition to my daily duplicity cloud backups of my most
205	> important data (so, /etc and /home are in the cloud, and mythtv's
206	> /var/video is just on a local rsync backup).
207
208	I wouldn't give my data out of my hands.
209
210	> Oh, and don't go anywhere near btrfs raid5/6 (btrfs on top of mdadm
211	> raid5/6 is fine, but you lose the data integrity features). I
212	> wouldn't go anywhere near that for at least a year, and probably
213	> longer.
214
215	It might take another 5 or 10 years before btrfs isn't questionable
216	anymore, if it ever gets there.
217
218	> Overall I'm very happy with btrfs though. Snapshots and reflinks are
219	> very handy - I can update containers and nfs roots after snapshotting
220	> them and it gives me a trivial rollback solution, and while I don't
221	> use snapper I do manually rotate through snapshots weekly. If you do
222	> run snapper I'd probably avoid generating large numbers of snapshots -
223	> one of my BUG problems happened as a result of snapper deleting a few
224	> hundred snapshots at once.
225
226	Snapper? I've never heard of that ...
227
228	> Btrfs's deferred processing of the log/btrees can cause the kinds of
229	> performance issues associated with garbage collection (or BUGs due to
230	> thundering herd problems). I use ionice to try to prioritize my IO so
231	> that stuff like mythtv recordings will block less realtime activities,
232	> and in the past that hasn't always worked with btrfs. The problem is
233	> that btrfs would accept too much data into its log, and then it would
234	> block all writes while it tried to catch up. I haven't seen that as
235	> much recently, so maybe they're getting better about that. As with
236	> any other scheduling problem it only works if you correctly block
237	> writes into the start of the pipeline (I've heard of similar problems
238	> with TCP QoS and such if you don't ensure that the bottleneck is the
239	> first router along the route - you can let in too much low-priority
240	> traffic and then at that point you're stuck dealing with it).
241
242	Queuing up the data when there's more data than the system can deal with
243	only works when the system has sufficient time to catch up with the
244	queue. Otherwise, you have to block something at some point, or you
245	must drop the data. At that point, it doesn't matter how you arrange
246	the contents of the queue within it.
247
248	> I'd suggest looking at the btrfs mailing list to get a survey for what
249	> people are dealing with. Just ignore all the threads marked as
250	> patches and look at the discussion threads.
251	>
252	> If you're getting the impression that btrfs isn't quite
253	> fire-and-forget, you're getting the right impression. Neither is
254	> Gentoo, so I wouldn't let that alone scare you off. But, I see no
255	> reason to not give you fair warning.
256
257	Gentoo /is/ fire-and-forget in that it works fine. Btrfs is not in that
258	it may work or not.

Subject	Author
Re: [gentoo-user] snapshots?	Neil Bothwick <neil@××××××××××.uk>
Re: [gentoo-user] snapshots?	Rich Freeman <rich0@g.o>