Re: [gentoo-user] snapshots? - gentoo-user - Gentoo Mailing List Archives

From:	Rich Freeman <rich0@g.o>
To:	gentoo-user@l.g.o
Subject:	Re: [gentoo-user] snapshots?
Date:	Fri, 01 Jan 2016 13:26:47
Message-Id:	`CAGfcS_nDON83L83vKoyMCdBD0V3zeJVk80ZXV0-=2gjh-MOcgQ@mail.gmail.com`
In Reply to:	Re: [gentoo-user] snapshots? by lee

1

On Fri, Jan 1, 2016 at 5:42 AM, lee <lee@××××××××.de> wrote:

2

> "Stefan G. Weichinger" <lists@×××××.at> writes:

3

>

4

>> btrfs offers RAID-like redundancy as well, no mdadm involved here.

5

>>

6

>> The general recommendation now is to stay at level-1 for now. That fits

7

>> your 2-disk-situation.

8

>

9

> Well, what shows better performance?  No btrfs-raid on hardware raid or

10

> btrfs raid on JBOD?

11

12

I would run btrfs on bare partitions and use btrfs's raid1

13

capabilities.  You're almost certainly going to get better

14

performance, and you get more data integrity features.  If you have a

15

silent corruption with mdadm doing the raid1 then btrfs will happily

16

warn you of your problem and you're going to have a really hard time

17

fixing it, because btrfs only sees one copy of the data which is bad,

18

and all mdadm can tell you is that the data is inconsistent with no

19

idea which one is right.  You'd end up having to try to manipulate the

20

underlying data to figure out which one is right and fix it (the data

21

is all there, but you'd probably end up hex-editing your disks).  If

22

you were using btrfs raid1 you'd just run a scrub and it would

23

detect/fix the problem, since btrfs would see both copies and know

24

which one is right.  Then if you ever move to raid5 when that matures

25

you eliminate the write hole with btrfs.

26

27

>>

28

>> I would avoid converting and stuff.

29

>>

30

>> Why not try a fresh install on the new disks with btrfs?

31

>

32

> Why would I want to spend another year to get back to where I'm now?

33

34

I wouldn't do a fresh install.  I'd just set up btrfs on the new disks

35

and copy your data over (preserving attributes/etc).  Before I did

36

that I'd create any subvolumes you want to have on the new disks and

37

copy the data into them.  The only way to convert a directory into a

38

subvolume after the fact is to create a subvolume with the new name,

39

copy the directory into it, and then rename the directory and

40

subvolume to swap their names, then delete the old directory.  That is

41

time-consuming, and depending on what directory you're talking about

42

you might want to be in single-user or boot from a rescue disk to do

43

it.

44

45

I wouldn't do an in-place ext4->btrfs conversion.  I know that there

46

were some regressions in that feature recently and I'm not sure where

47

it stands right now.

48

49

>> I never had /boot on btrfs so far, maybe others can guide you with this.

50

>>

51

>> My /boot is plain extX on maybe RAID1 (differs on

52

>> laptops/desktop/servers), I size it 500 MB to have space for multiple

53

>> kernels (especially on dualboot-systems).

54

>>

55

>> Then some swap-partitions, and the rest for btrfs.

56

>

57

> There you go, you end up with an odd setup.  I don't like /boot

58

> partitions.  As well as swap partitions, they need to be on raid.  So

59

> unless you use hardware raid, you end up with mdadm /and/ btrfs /and/

60

> perhaps ext4, /and/ multiple partitions.

61

62

With grub2 you can boot from btrfs.  I used to use a separate boot

63

partition on ext4 with btrfs for the rest, but now my /boot is on my

64

root partition.  I'd still partition space for a boot partition in

65

case you move to EFI in the future but I wouldn't bother formatting it

66

or setting it up right now.  As long as you're using grub2 you really

67

don't need to do anything special.

68

69

You DO need to partition your disks though, even if you only have one

70

big partition for the whole thing.  The reason is that this gives

71

space for grub to stick its loaders/etc on the disk.

72

73

I don't use swap.  If I did I'd probably set up an mdadm array for it.

74

According to the FAQ btrfs still doesn't support swap from a file.

75

76

There isn't really anything painful about that setup though.  Swap

77

isn't needed to boot, so openrc/systemd will start up mdadm and

78

activate your swap.  I'm not sure if dracut will do that during early

79

boot or not, but it doesn't really matter if it does.

80

81

If you have two drives I'd just set them up as:

82

sd[ab]1 - 1GB boot partition unformatted for future EFI

83

sd[ab]2 - mdadm raid1 for swap

84

sd[ab]3 - btrfs

85

86

87

> When you use hardware raid, it

88

> can be disadvantageous compared to btrfs-raid --- and when you use it

89

> anyway, things are suddenly much more straightforward because everything

90

> is on raid to begin with.

91

92

I'd stick with mdadm.  You're never going to run mixed

93

btrfs/hardware-raid on a single drive, and the only time I'd consider

94

hardware raid is with a high quality raid card.  You'd still have to

95

convince me not to use mdadm even if I had one of those lying around.

96

97

>> Create your btrfs-"pool" with:

98

>>

99

>> # mkfs.btrfs -m raid1 -d raid1 /dev/sda3 /dev/sdb3

100

>>

101

>> Then check for your btrfs-fs with:

102

>>

103

>> # btrfs fi show

104

>>

105

>> Oh: I realize that I start writing a howto here ;-)

106

>

107

> That doesn't work without an extra /boot partition?

108

109

It works fine without a boot partition if you're using grub2.  If you

110

want to use grub legacy you'll need a boot partition.

111

112

>

113

> How's btrfs's performance when you use swap files instead of swap

114

> partitions to avoid the need for mdadm?

115

116

btrfs does not support swap files at present.  When it does you'll

117

need to disable COW for them (using chattr) otherwise they'll be

118

fragmented until your system grinds to a halt.  A swap file is about

119

the worst case scenario for any COW filesystem - I'm not sure how ZFS

120

handles them.

121

122

>

123

> Now I understand that it's apparently not possible to simply make a

124

> btrfs-raid1 from the two raw disks, copy the system over, install grub

125

> and boot from that.  (I could live with swap files instead of swap

126

> partitions.)

127

128

Even if you used no swap and no boot like I have right now, you'd

129

still want to create a single large partition for better grub2

130

support.  Without space between the partition table and the first

131

partition (which you'll want to start at 2048 or whatever the default

132

is these days) it has to resort to blocklists.  That means that if for

133

any reason the files in /boot/grub move on disk your system won't

134

boot.  That isn't a btrfs thing - it holds just as true if you're

135

using ext4 and is generally frowned upon.

136

137

>

138

>> As mentioned here several times I am using btrfs on >6 of my systems for

139

>> years now. And I don't look back so far.

140

>

141

> And has it always been reliable?

142

>

143

144

I've never had an episode that resulted in actual data loss.  I HAVE

145

had an episode or two which resulted in downtime.

146

147

When I've had btrfs issues I can generally mount the filesystem

148

read-only just fine.  The problem was that cleanup threads were

149

causing kernel BUGs which cause the filesystem to stop syncing (not a

150

full panic, but when all your filesystems are effectively read-only

151

there isn't much difference in many cases).  If I rebooted the system

152

would BUG within a few minutes.  In one case I was able to boot from a

153

more recent kernel on a rescue disk and fix things by just mounting

154

the drive and letting it sit for 20min to finish cleaning things up

155

while the disk was otherwise idle (some kind of locking issue most

156

likely) - maybe I had to run btrfsck on it.  In the other case it was

157

being really fussy and I ended up just restoring from a backup since

158

that was the path of least resistance.  I could have probably

159

eventually fixed the problem, and the drive was mountable read-only

160

the entire time so given sufficient space I could have copied all the

161

data over to a new filesystem with no loss at all.

162

163

Things have been pretty quiet for the last six months though, and I

164

think it is largely due to a change in strategy around kernel

165

versions.  Right now I'm running 3.18.  I'm starting to consider a

166

move to 4.1, but there is a backlog of btrfs fixes for stable that I'm

167

waiting for Greg to catch up on and maybe I'll wait for a version

168

after that to see if things settle down.  Around the time of

169

3.14->3.18 btrfs maturity seemed to settle in a bit, and at this point

170

I think newer kernels are more likely to introduce regressions than

171

fix problems.  The pace of btrfs patching seems to have increased as

172

well in the last year (which is good in the long-term - most are

173

bugfixes - but in the short term even bugfixes can introduce bugs).

174

Unless I have a reason not to at this point I plan to run only

175

longterm kernels, and move to them when they're about six months

176

mature.

177

178

If I had done that in the past I think I would have completely avoided

179

that issue that required me to restore from backups.  That happened in

180

the 3.15/3.16 timeframe and I'd have never even run those kernels.

181

They were stable kernels at the time, and a few versions in when I

182

switched to them (I was probably just following gentoo-sources stable

183

keywords back then), but they still had regressions (fixes were

184

eventually backported).

185

186

I think btrfs is certainly usable today, though I'd be hesitant to run

187

it on production servers depending on the use case (I'd be looking for

188

a use case that actually has a significant benefit from using btrfs,

189

and which somehow mitigates the risks).

190

191

Right now I keep a daily rsnapshot (rsync on steroids - it's in the

192

Gentoo repo) backup of my btrfs filesystems on ext4.  I occasionally

193

debate whether I still need it, but I sleep better knowing I have it.

194

This is in addition to my daily duplicity cloud backups of my most

195

important data (so, /etc and /home are in the cloud, and mythtv's

196

/var/video is just on a local rsync backup).

197

198

Oh, and don't go anywhere near btrfs raid5/6 (btrfs on top of mdadm

199

raid5/6 is fine, but you lose the data integrity features).  I

200

wouldn't go anywhere near that for at least a year, and probably

201

longer.

202

203

Overall I'm very happy with btrfs though.  Snapshots and reflinks are

204

very handy - I can update containers and nfs roots after snapshotting

205

them and it gives me a trivial rollback solution, and while I don't

206

use snapper I do manually rotate through snapshots weekly.  If you do

207

run snapper I'd probably avoid generating large numbers of snapshots -

208

one of my BUG problems happened as a result of snapper deleting a few

209

hundred snapshots at once.

210

211

Btrfs's deferred processing of the log/btrees can cause the kinds of

212

performance issues associated with garbage collection (or BUGs due to

213

thundering herd problems).  I use ionice to try to prioritize my IO so

214

that stuff like mythtv recordings will block less realtime activities,

215

and in the past that hasn't always worked with btrfs.  The problem is

216

that btrfs would accept too much data into its log, and then it would

217

block all writes while it tried to catch up.  I haven't seen that as

218

much recently, so maybe they're getting better about that.  As with

219

any other scheduling problem it only works if you correctly block

220

writes into the start of the pipeline (I've heard of similar problems

221

with TCP QoS and such if you don't ensure that the bottleneck is the

222

first router along the route - you can let in too much low-priority

223

traffic and then at that point you're stuck dealing with it).

224

225

I'd suggest looking at the btrfs mailing list to get a survey for what

226

people are dealing with.  Just ignore all the threads marked as

227

patches and look at the discussion threads.

228

229

If you're getting the impression that btrfs isn't quite

230

fire-and-forget, you're getting the right impression.  Neither is

231

Gentoo, so I wouldn't let that alone scare you off.  But, I see no

232

reason to not give you fair warning.

233

234

--

235

Rich

Gentoo Archives: gentoo-user

Replies

1	On Fri, Jan 1, 2016 at 5:42 AM, lee <lee@××××××××.de> wrote:
2	> "Stefan G. Weichinger" <lists@×××××.at> writes:
3	>
4	>> btrfs offers RAID-like redundancy as well, no mdadm involved here.
5	>>
6	>> The general recommendation now is to stay at level-1 for now. That fits
7	>> your 2-disk-situation.
8	>
9	> Well, what shows better performance? No btrfs-raid on hardware raid or
10	> btrfs raid on JBOD?
11
12	I would run btrfs on bare partitions and use btrfs's raid1
13	capabilities. You're almost certainly going to get better
14	performance, and you get more data integrity features. If you have a
15	silent corruption with mdadm doing the raid1 then btrfs will happily
16	warn you of your problem and you're going to have a really hard time
17	fixing it, because btrfs only sees one copy of the data which is bad,
18	and all mdadm can tell you is that the data is inconsistent with no
19	idea which one is right. You'd end up having to try to manipulate the
20	underlying data to figure out which one is right and fix it (the data
21	is all there, but you'd probably end up hex-editing your disks). If
22	you were using btrfs raid1 you'd just run a scrub and it would
23	detect/fix the problem, since btrfs would see both copies and know
24	which one is right. Then if you ever move to raid5 when that matures
25	you eliminate the write hole with btrfs.
26
27	>>
28	>> I would avoid converting and stuff.
29	>>
30	>> Why not try a fresh install on the new disks with btrfs?
31	>
32	> Why would I want to spend another year to get back to where I'm now?
33
34	I wouldn't do a fresh install. I'd just set up btrfs on the new disks
35	and copy your data over (preserving attributes/etc). Before I did
36	that I'd create any subvolumes you want to have on the new disks and
37	copy the data into them. The only way to convert a directory into a
38	subvolume after the fact is to create a subvolume with the new name,
39	copy the directory into it, and then rename the directory and
40	subvolume to swap their names, then delete the old directory. That is
41	time-consuming, and depending on what directory you're talking about
42	you might want to be in single-user or boot from a rescue disk to do
43	it.
44
45	I wouldn't do an in-place ext4->btrfs conversion. I know that there
46	were some regressions in that feature recently and I'm not sure where
47	it stands right now.
48
49	>> I never had /boot on btrfs so far, maybe others can guide you with this.
50	>>
51	>> My /boot is plain extX on maybe RAID1 (differs on
52	>> laptops/desktop/servers), I size it 500 MB to have space for multiple
53	>> kernels (especially on dualboot-systems).
54	>>
55	>> Then some swap-partitions, and the rest for btrfs.
56	>
57	> There you go, you end up with an odd setup. I don't like /boot
58	> partitions. As well as swap partitions, they need to be on raid. So
59	> unless you use hardware raid, you end up with mdadm /and/ btrfs /and/
60	> perhaps ext4, /and/ multiple partitions.
61
62	With grub2 you can boot from btrfs. I used to use a separate boot
63	partition on ext4 with btrfs for the rest, but now my /boot is on my
64	root partition. I'd still partition space for a boot partition in
65	case you move to EFI in the future but I wouldn't bother formatting it
66	or setting it up right now. As long as you're using grub2 you really
67	don't need to do anything special.
68
69	You DO need to partition your disks though, even if you only have one
70	big partition for the whole thing. The reason is that this gives
71	space for grub to stick its loaders/etc on the disk.
72
73	I don't use swap. If I did I'd probably set up an mdadm array for it.
74	According to the FAQ btrfs still doesn't support swap from a file.
75
76	There isn't really anything painful about that setup though. Swap
77	isn't needed to boot, so openrc/systemd will start up mdadm and
78	activate your swap. I'm not sure if dracut will do that during early
79	boot or not, but it doesn't really matter if it does.
80
81	If you have two drives I'd just set them up as:
82	sd[ab]1 - 1GB boot partition unformatted for future EFI
83	sd[ab]2 - mdadm raid1 for swap
84	sd[ab]3 - btrfs
85
86
87	> When you use hardware raid, it
88	> can be disadvantageous compared to btrfs-raid --- and when you use it
89	> anyway, things are suddenly much more straightforward because everything
90	> is on raid to begin with.
91
92	I'd stick with mdadm. You're never going to run mixed
93	btrfs/hardware-raid on a single drive, and the only time I'd consider
94	hardware raid is with a high quality raid card. You'd still have to
95	convince me not to use mdadm even if I had one of those lying around.
96
97	>> Create your btrfs-"pool" with:
98	>>
99	>> # mkfs.btrfs -m raid1 -d raid1 /dev/sda3 /dev/sdb3
100	>>
101	>> Then check for your btrfs-fs with:
102	>>
103	>> # btrfs fi show
104	>>
105	>> Oh: I realize that I start writing a howto here ;-)
106	>
107	> That doesn't work without an extra /boot partition?
108
109	It works fine without a boot partition if you're using grub2. If you
110	want to use grub legacy you'll need a boot partition.
111
112	>
113	> How's btrfs's performance when you use swap files instead of swap
114	> partitions to avoid the need for mdadm?
115
116	btrfs does not support swap files at present. When it does you'll
117	need to disable COW for them (using chattr) otherwise they'll be
118	fragmented until your system grinds to a halt. A swap file is about
119	the worst case scenario for any COW filesystem - I'm not sure how ZFS
120	handles them.
121
122	>
123	> Now I understand that it's apparently not possible to simply make a
124	> btrfs-raid1 from the two raw disks, copy the system over, install grub
125	> and boot from that. (I could live with swap files instead of swap
126	> partitions.)
127
128	Even if you used no swap and no boot like I have right now, you'd
129	still want to create a single large partition for better grub2
130	support. Without space between the partition table and the first
131	partition (which you'll want to start at 2048 or whatever the default
132	is these days) it has to resort to blocklists. That means that if for
133	any reason the files in /boot/grub move on disk your system won't
134	boot. That isn't a btrfs thing - it holds just as true if you're
135	using ext4 and is generally frowned upon.
136
137	>
138	>> As mentioned here several times I am using btrfs on >6 of my systems for
139	>> years now. And I don't look back so far.
140	>
141	> And has it always been reliable?
142	>
143
144	I've never had an episode that resulted in actual data loss. I HAVE
145	had an episode or two which resulted in downtime.
146
147	When I've had btrfs issues I can generally mount the filesystem
148	read-only just fine. The problem was that cleanup threads were
149	causing kernel BUGs which cause the filesystem to stop syncing (not a
150	full panic, but when all your filesystems are effectively read-only
151	there isn't much difference in many cases). If I rebooted the system
152	would BUG within a few minutes. In one case I was able to boot from a
153	more recent kernel on a rescue disk and fix things by just mounting
154	the drive and letting it sit for 20min to finish cleaning things up
155	while the disk was otherwise idle (some kind of locking issue most
156	likely) - maybe I had to run btrfsck on it. In the other case it was
157	being really fussy and I ended up just restoring from a backup since
158	that was the path of least resistance. I could have probably
159	eventually fixed the problem, and the drive was mountable read-only
160	the entire time so given sufficient space I could have copied all the
161	data over to a new filesystem with no loss at all.
162
163	Things have been pretty quiet for the last six months though, and I
164	think it is largely due to a change in strategy around kernel
165	versions. Right now I'm running 3.18. I'm starting to consider a
166	move to 4.1, but there is a backlog of btrfs fixes for stable that I'm
167	waiting for Greg to catch up on and maybe I'll wait for a version
168	after that to see if things settle down. Around the time of
169	3.14->3.18 btrfs maturity seemed to settle in a bit, and at this point
170	I think newer kernels are more likely to introduce regressions than
171	fix problems. The pace of btrfs patching seems to have increased as
172	well in the last year (which is good in the long-term - most are
173	bugfixes - but in the short term even bugfixes can introduce bugs).
174	Unless I have a reason not to at this point I plan to run only
175	longterm kernels, and move to them when they're about six months
176	mature.
177
178	If I had done that in the past I think I would have completely avoided
179	that issue that required me to restore from backups. That happened in
180	the 3.15/3.16 timeframe and I'd have never even run those kernels.
181	They were stable kernels at the time, and a few versions in when I
182	switched to them (I was probably just following gentoo-sources stable
183	keywords back then), but they still had regressions (fixes were
184	eventually backported).
185
186	I think btrfs is certainly usable today, though I'd be hesitant to run
187	it on production servers depending on the use case (I'd be looking for
188	a use case that actually has a significant benefit from using btrfs,
189	and which somehow mitigates the risks).
190
191	Right now I keep a daily rsnapshot (rsync on steroids - it's in the
192	Gentoo repo) backup of my btrfs filesystems on ext4. I occasionally
193	debate whether I still need it, but I sleep better knowing I have it.
194	This is in addition to my daily duplicity cloud backups of my most
195	important data (so, /etc and /home are in the cloud, and mythtv's
196	/var/video is just on a local rsync backup).
197
198	Oh, and don't go anywhere near btrfs raid5/6 (btrfs on top of mdadm
199	raid5/6 is fine, but you lose the data integrity features). I
200	wouldn't go anywhere near that for at least a year, and probably
201	longer.
202
203	Overall I'm very happy with btrfs though. Snapshots and reflinks are
204	very handy - I can update containers and nfs roots after snapshotting
205	them and it gives me a trivial rollback solution, and while I don't
206	use snapper I do manually rotate through snapshots weekly. If you do
207	run snapper I'd probably avoid generating large numbers of snapshots -
208	one of my BUG problems happened as a result of snapper deleting a few
209	hundred snapshots at once.
210
211	Btrfs's deferred processing of the log/btrees can cause the kinds of
212	performance issues associated with garbage collection (or BUGs due to
213	thundering herd problems). I use ionice to try to prioritize my IO so
214	that stuff like mythtv recordings will block less realtime activities,
215	and in the past that hasn't always worked with btrfs. The problem is
216	that btrfs would accept too much data into its log, and then it would
217	block all writes while it tried to catch up. I haven't seen that as
218	much recently, so maybe they're getting better about that. As with
219	any other scheduling problem it only works if you correctly block
220	writes into the start of the pipeline (I've heard of similar problems
221	with TCP QoS and such if you don't ensure that the bottleneck is the
222	first router along the route - you can let in too much low-priority
223	traffic and then at that point you're stuck dealing with it).
224
225	I'd suggest looking at the btrfs mailing list to get a survey for what
226	people are dealing with. Just ignore all the threads marked as
227	patches and look at the discussion threads.
228
229	If you're getting the impression that btrfs isn't quite
230	fire-and-forget, you're getting the right impression. Neither is
231	Gentoo, so I wouldn't let that alone scare you off. But, I see no
232	reason to not give you fair warning.
233
234	--
235	Rich