Gentoo Archives: gentoo-user

From: lee <lee@××××××××.de>
To: gentoo-user@l.g.o
Subject: Re: [gentoo-user] snapshots?
Date: Tue, 05 Jan 2016 22:36:01
Message-Id: 87ziwjr7sf.fsf@heimdali.yagibdah.de
In Reply to: Re: [gentoo-user] snapshots? by Rich Freeman
1 Rich Freeman <rich0@g.o> writes:
2
3 > On Fri, Jan 1, 2016 at 5:42 AM, lee <lee@××××××××.de> wrote:
4 >> "Stefan G. Weichinger" <lists@×××××.at> writes:
5 >>
6 >>> btrfs offers RAID-like redundancy as well, no mdadm involved here.
7 >>>
8 >>> The general recommendation now is to stay at level-1 for now. That fits
9 >>> your 2-disk-situation.
10 >>
11 >> Well, what shows better performance? No btrfs-raid on hardware raid or
12 >> btrfs raid on JBOD?
13 >
14 > I would run btrfs on bare partitions and use btrfs's raid1
15 > capabilities. You're almost certainly going to get better
16 > performance, and you get more data integrity features.
17
18 That would require me to set up software raid with mdadm as well, for
19 the swap partition.
20
21 > If you have a silent corruption with mdadm doing the raid1 then btrfs
22 > will happily warn you of your problem and you're going to have a
23 > really hard time fixing it,
24
25 BTW, what do you do when you have silent corruption on a swap partition?
26 Is that possible, or does swapping use its own checksums?
27
28 > [...]
29 >
30 >>>
31 >>> I would avoid converting and stuff.
32 >>>
33 >>> Why not try a fresh install on the new disks with btrfs?
34 >>
35 >> Why would I want to spend another year to get back to where I'm now?
36 >
37 > I wouldn't do a fresh install. I'd just set up btrfs on the new disks
38 > and copy your data over (preserving attributes/etc).
39
40 That was the idea.
41
42 > I wouldn't do an in-place ext4->btrfs conversion. I know that there
43 > were some regressions in that feature recently and I'm not sure where
44 > it stands right now.
45
46 That adds to the uncertainty of btrfs.
47
48
49 > [...]
50 >>
51 >> There you go, you end up with an odd setup. I don't like /boot
52 >> partitions. As well as swap partitions, they need to be on raid. So
53 >> unless you use hardware raid, you end up with mdadm /and/ btrfs /and/
54 >> perhaps ext4, /and/ multiple partitions.
55 >
56 > [...]
57 > There isn't really anything painful about that setup though.
58
59 It's still odd. I already have two different file systems and the
60 overhead of one kind of software raid while I would rather stick to one
61 file system. With btrfs, I'd still have two different file systems ---
62 plus mdadm and the overhead of three different kinds of software raid.
63
64 How would it be so much better to triple the software raids and to still
65 have the same number of file systems?
66
67 >> When you use hardware raid, it
68 >> can be disadvantageous compared to btrfs-raid --- and when you use it
69 >> anyway, things are suddenly much more straightforward because everything
70 >> is on raid to begin with.
71 >
72 > I'd stick with mdadm. You're never going to run mixed
73 > btrfs/hardware-raid on a single drive,
74
75 A single disk doesn't make for a raid.
76
77 > and the only time I'd consider
78 > hardware raid is with a high quality raid card. You'd still have to
79 > convince me not to use mdadm even if I had one of those lying around.
80
81 From my own experience, I can tell you that mdadm already does have
82 significant overhead when you use a raid1 of two disks and a raid5 with
83 three disks. This overhead may be somewhat due to the SATA controller
84 not being as capable as one would expect --- yet that doesn't matter
85 because one thing you're looking at, besides reliability, is the overall
86 performance. And the overall performance very noticeably increased when
87 I migrated from mdadm raids to hardware raids, with the same disks and
88 the same hardware, except that the raid card was added.
89
90 And that was only 5 disks. I also know that the performance with a ZFS
91 mirror with two disks was disappointingly poor. Those disks aren't
92 exactly fast, but still. I haven't tested yet if it changed after
93 adding 4 mirrored disks to the pool. And I know that the performance of
94 another hardware raid5 with 6 disks was very good.
95
96 Thus I'm not convinced that software raid is the way to go. I wish they
97 would make hardware ZFS (or btrfs, if it ever becomes reliable)
98 controllers.
99
100 Now consider:
101
102
103 + candidates for hardware raid are two small disks (72GB each)
104 + data on those is either mostly read, or temporary/cache-like
105 + this setup works without any issues for over a year now
106 + using btrfs would triple the software raids used
107 + btrfs is uncertain, reliability questionable
108 + mdadm would have to be added as another layer of complexity
109 + the disks are SAS disks, genuinely made to be run in a hardware raid
110 + the setup with hardware raid is straightforward and simple, the setup
111 with btrfs is anything but
112
113
114 The relevant advantage of btrfs is being able to make snapshots. Is
115 that worth all the (potential) trouble? Snapshots are worthless when
116 the file system destroys them with the rest of the data.
117
118 > [...]
119 >> How's btrfs's performance when you use swap files instead of swap
120 >> partitions to avoid the need for mdadm?
121 >
122 > btrfs does not support swap files at present.
123
124 What happens when you try it?
125
126 > When it does you'll need to disable COW for them (using chattr)
127 > otherwise they'll be fragmented until your system grinds to a halt. A
128 > swap file is about the worst case scenario for any COW filesystem -
129 > I'm not sure how ZFS handles them.
130
131 Well, then they need to make special provisions for swap files in btrfs
132 so that we can finally get rid of the swap partitions.
133
134
135 > [...]
136 >>> As mentioned here several times I am using btrfs on >6 of my systems for
137 >>> years now. And I don't look back so far.
138 >>
139 >> And has it always been reliable?
140 >>
141 >
142 > I've never had an episode that resulted in actual data loss. I HAVE
143 > had an episode or two which resulted in downtime.
144 >
145 > When I've had btrfs issues I can generally mount the filesystem
146 > read-only just fine. The problem was that cleanup threads were
147 > causing kernel BUGs which cause the filesystem to stop syncing (not a
148 > full panic, but when all your filesystems are effectively read-only
149 > there isn't much difference in many cases). If I rebooted the system
150 > would BUG within a few minutes. In one case I was able to boot from a
151 > more recent kernel on a rescue disk and fix things by just mounting
152 > the drive and letting it sit for 20min to finish cleaning things up
153 > while the disk was otherwise idle (some kind of locking issue most
154 > likely) - maybe I had to run btrfsck on it. In the other case it was
155 > being really fussy and I ended up just restoring from a backup since
156 > that was the path of least resistance. I could have probably
157 > eventually fixed the problem, and the drive was mountable read-only
158 > the entire time so given sufficient space I could have copied all the
159 > data over to a new filesystem with no loss at all.
160
161 That's exactly what I don't want to have to deal with. It would defeat
162 the most important purpose of using raid.
163
164 > Things have been pretty quiet for the last six months though, and I
165 > think it is largely due to a change in strategy around kernel
166 > versions. Right now I'm running 3.18. I'm starting to consider a
167 > move to 4.1, but there is a backlog of btrfs fixes for stable that I'm
168 > waiting for Greg to catch up on and maybe I'll wait for a version
169 > after that to see if things settle down. Around the time of
170 > 3.14->3.18 btrfs maturity seemed to settle in a bit, and at this point
171 > I think newer kernels are more likely to introduce regressions than
172 > fix problems. The pace of btrfs patching seems to have increased as
173 > well in the last year (which is good in the long-term - most are
174 > bugfixes - but in the short term even bugfixes can introduce bugs).
175 > Unless I have a reason not to at this point I plan to run only
176 > longterm kernels, and move to them when they're about six months
177 > mature.
178
179 That's another thing making it difficult to use btrfs.
180
181 > If I had done that in the past I think I would have completely avoided
182 > that issue that required me to restore from backups. That happened in
183 > the 3.15/3.16 timeframe and I'd have never even run those kernels.
184 > They were stable kernels at the time, and a few versions in when I
185 > switched to them (I was probably just following gentoo-sources stable
186 > keywords back then), but they still had regressions (fixes were
187 > eventually backported).
188
189 How do you know if an old kernel you pick because you think the btrfs
190 part works well enough is the right pick? You can either encounter a
191 bug that has been fixed or a regression that hasn't been
192 discovered/fixed yet. That way, you can't win.
193
194 > I think btrfs is certainly usable today, though I'd be hesitant to run
195 > it on production servers depending on the use case (I'd be looking for
196 > a use case that actually has a significant benefit from using btrfs,
197 > and which somehow mitigates the risks).
198
199 There you go, it's usable, and the risk of using it is too high.
200
201 > Right now I keep a daily rsnapshot (rsync on steroids - it's in the
202 > Gentoo repo) backup of my btrfs filesystems on ext4. I occasionally
203 > debate whether I still need it, but I sleep better knowing I have it.
204 > This is in addition to my daily duplicity cloud backups of my most
205 > important data (so, /etc and /home are in the cloud, and mythtv's
206 > /var/video is just on a local rsync backup).
207
208 I wouldn't give my data out of my hands.
209
210 > Oh, and don't go anywhere near btrfs raid5/6 (btrfs on top of mdadm
211 > raid5/6 is fine, but you lose the data integrity features). I
212 > wouldn't go anywhere near that for at least a year, and probably
213 > longer.
214
215 It might take another 5 or 10 years before btrfs isn't questionable
216 anymore, if it ever gets there.
217
218 > Overall I'm very happy with btrfs though. Snapshots and reflinks are
219 > very handy - I can update containers and nfs roots after snapshotting
220 > them and it gives me a trivial rollback solution, and while I don't
221 > use snapper I do manually rotate through snapshots weekly. If you do
222 > run snapper I'd probably avoid generating large numbers of snapshots -
223 > one of my BUG problems happened as a result of snapper deleting a few
224 > hundred snapshots at once.
225
226 Snapper? I've never heard of that ...
227
228 > Btrfs's deferred processing of the log/btrees can cause the kinds of
229 > performance issues associated with garbage collection (or BUGs due to
230 > thundering herd problems). I use ionice to try to prioritize my IO so
231 > that stuff like mythtv recordings will block less realtime activities,
232 > and in the past that hasn't always worked with btrfs. The problem is
233 > that btrfs would accept too much data into its log, and then it would
234 > block all writes while it tried to catch up. I haven't seen that as
235 > much recently, so maybe they're getting better about that. As with
236 > any other scheduling problem it only works if you correctly block
237 > writes into the start of the pipeline (I've heard of similar problems
238 > with TCP QoS and such if you don't ensure that the bottleneck is the
239 > first router along the route - you can let in too much low-priority
240 > traffic and then at that point you're stuck dealing with it).
241
242 Queuing up the data when there's more data than the system can deal with
243 only works when the system has sufficient time to catch up with the
244 queue. Otherwise, you have to block something at some point, or you
245 must drop the data. At that point, it doesn't matter how you arrange
246 the contents of the queue within it.
247
248 > I'd suggest looking at the btrfs mailing list to get a survey for what
249 > people are dealing with. Just ignore all the threads marked as
250 > patches and look at the discussion threads.
251 >
252 > If you're getting the impression that btrfs isn't quite
253 > fire-and-forget, you're getting the right impression. Neither is
254 > Gentoo, so I wouldn't let that alone scare you off. But, I see no
255 > reason to not give you fair warning.
256
257 Gentoo /is/ fire-and-forget in that it works fine. Btrfs is not in that
258 it may work or not.

Replies

Subject Author
Re: [gentoo-user] snapshots? Neil Bothwick <neil@××××××××××.uk>
Re: [gentoo-user] snapshots? Rich Freeman <rich0@g.o>