Gentoo Archives: gentoo-user

From: lee <lee@××××××××.de>
To: gentoo-user@l.g.o
Subject: Re: [gentoo-user] snapshots?
Date: Tue, 12 Jan 2016 23:01:16
Message-Id: 87k2neh09d.fsf@heimdali.yagibdah.de
In Reply to: Re: [gentoo-user] snapshots? by Rich Freeman
1 Rich Freeman <rich0@g.o> writes:
2
3 > On Tue, Jan 5, 2016 at 5:16 PM, lee <lee@××××××××.de> wrote:
4 >> Rich Freeman <rich0@g.o> writes:
5 >>
6 >>>
7 >>> I would run btrfs on bare partitions and use btrfs's raid1
8 >>> capabilities. You're almost certainly going to get better
9 >>> performance, and you get more data integrity features.
10 >>
11 >> That would require me to set up software raid with mdadm as well, for
12 >> the swap partition.
13 >
14 > Correct, if you don't want a panic if a single swap drive fails.
15 >
16 >>
17 >>> If you have a silent corruption with mdadm doing the raid1 then btrfs
18 >>> will happily warn you of your problem and you're going to have a
19 >>> really hard time fixing it,
20 >>
21 >> BTW, what do you do when you have silent corruption on a swap partition?
22 >> Is that possible, or does swapping use its own checksums?
23 >
24 > If the kernel pages in data from the good mirror, nothing happens. If
25 > the kernel pages in data from the bad mirror, then whatever data
26 > happens to be there is what will get loaded and used and/or executed.
27 > If you're lucky the modified data will be part of unused heap or
28 > something. If not, well, just about anything could happen.
29 >
30 > Nothing in this scenario will check that the data is correct, except
31 > for a forced scrub of the disks. A scrub would probably detect the
32 > error, but I don't think mdadm has any ability to recover it. Your
33 > best bet is probably to try to immediately reboot and save what you
34 > can, or a less-risky solution assuming you don't have anything
35 > critical in RAM is to just do an immediate hard reset so that there is
36 > no risk of bad data getting swapped in and overwriting good data on
37 > your normal filesystems.
38
39 Then you might be better off with no swap unless you put it on a file
40 system that uses check sums.
41
42 >> It's still odd. I already have two different file systems and the
43 >> overhead of one kind of software raid while I would rather stick to one
44 >> file system. With btrfs, I'd still have two different file systems ---
45 >> plus mdadm and the overhead of three different kinds of software raid.
46 >
47 > I'm not sure why you'd need two different filesystems.
48
49 btrfs and zfs
50
51 I won't put my data on btrfs for at least quite a while.
52
53 > Just btrfs for your data. I'm not sure where you're counting three
54 > types of software raid either - you just have your swap.
55
56 btrfs raid is software raid, zfs raid is software raid, mdadm is
57 software raid. That makes three different sofware raids.
58
59 > And I don't think any of this involves any significant overhead, other
60 > than configuration.
61
62 mdadm does have a very significant performance overhead. ZFS mirror
63 performance seems to be rather poor. I don't know how much overhead is
64 involved with zfs and btrfs software raid, yet since they basically all
65 do the same thing, I have my doubts that the overhead is significantly
66 lower than the overhead of mdadm.
67
68 >> How would it be so much better to triple the software raids and to still
69 >> have the same number of file systems?
70 >
71 > Well, the difference would be more data integrity insofar as hardware
72 > failure goes, but certainly more risk of logical errors (IMO).
73
74 There would be a possibility for more data integrity for the root file
75 system, assuming that btrfs is as reliable as ext4 on hardware raid. Is
76 it?
77
78 That's about 10GB, mostly read and not written to. It would be a
79 very minor improvement, if any.
80
81 >>>> When you use hardware raid, it
82 >>>> can be disadvantageous compared to btrfs-raid --- and when you use it
83 >>>> anyway, things are suddenly much more straightforward because everything
84 >>>> is on raid to begin with.
85 >>>
86 >>> I'd stick with mdadm. You're never going to run mixed
87 >>> btrfs/hardware-raid on a single drive,
88 >>
89 >> A single disk doesn't make for a raid.
90 >
91 > You misunderstood my statement. If you have two drives, you can't run
92 > both hardware raid and btrfs raid across them. Hardware raid setups
93 > don't generally support running across only part of a drive, and in
94 > this setup you'd have to run hardware raid on part of each of two
95 > single drives.
96
97 I have two drives to hold the root file system and the swap space. The
98 raid controller they'd be connected do does not support using disks
99 partially.
100
101 >>> and the only time I'd consider
102 >>> hardware raid is with a high quality raid card. You'd still have to
103 >>> convince me not to use mdadm even if I had one of those lying around.
104 >>
105 >> From my own experience, I can tell you that mdadm already does have
106 >> significant overhead when you use a raid1 of two disks and a raid5 with
107 >> three disks. This overhead may be somewhat due to the SATA controller
108 >> not being as capable as one would expect --- yet that doesn't matter
109 >> because one thing you're looking at, besides reliability, is the overall
110 >> performance. And the overall performance very noticeably increased when
111 >> I migrated from mdadm raids to hardware raids, with the same disks and
112 >> the same hardware, except that the raid card was added.
113 >
114 > Well, sure, the raid card probably had battery-backed cache if it was
115 > decent, so linux could complete its commits to RAM and not have to
116 > wait for the disks.
117
118 yes
119
120 >> And that was only 5 disks. I also know that the performance with a ZFS
121 >> mirror with two disks was disappointingly poor. Those disks aren't
122 >> exactly fast, but still. I haven't tested yet if it changed after
123 >> adding 4 mirrored disks to the pool. And I know that the performance of
124 >> another hardware raid5 with 6 disks was very good.
125 >
126 > You're probably going to find the performance of a COW filesystem to
127 > be inferior to that of an overwrite-in-place filesystem, simply
128 > because the latter has to do less work.
129
130 Reading isn't as fast as I would expect, either.
131
132 >> Thus I'm not convinced that software raid is the way to go. I wish they
133 >> would make hardware ZFS (or btrfs, if it ever becomes reliable)
134 >> controllers.
135 >
136 > I doubt it would perform any better. What would that controller do
137 > that your CPU wouldn't do?
138
139 The CPU wouldn't need to do what the controller does and have time to do
140 other things instead.
141
142 > Well, other than have battery-backed cache, which would help in any
143 > circumstance. If you stuck 5 raid cards in your PC and put one drive
144 > on each card and put mdadm or ZFS across all five it would almost
145 > certainly perform better because you're adding battery-backed cache.
146
147 It's probably not only that. A 512MB cache probably doesn't make that
148 much difference. I'm guessing that the SATA controller might be
149 overwhelmed when it has to handle 5 disks simultaneously while the
150 hardware raid controller is designed to handle up to 256 disks
151 simultaneously and thus does a much better job with a couple disks,
152 taking the load off of the rest of the system.
153
154 In the end, it doesn't really matter what exactly causes the difference
155 in performance. What matters is that the performance is so much better.
156
157 >> The relevant advantage of btrfs is being able to make snapshots. Is
158 >> that worth all the (potential) trouble? Snapshots are worthless when
159 >> the file system destroys them with the rest of the data.
160 >
161 > And that is why I wouldn't use btrfs on a production system unless the
162 > use case mitigated this risk and there was benefit from the snapshots.
163 > Of course you're taking on more risk using an experimental filesystem.
164
165 Yes, and I'd have other disadvantages. I've come to think that being
166 able to make snapshots isn't worth all the trouble.
167
168 >>> btrfs does not support swap files at present.
169 >>
170 >> What happens when you try it?
171 >
172 > No idea. Should be easy to test in a VM. I suspect either an error
173 > or a kernel bug/panic/etc.
174
175 If it's that bad, that doesn't sound like a file system ready to be used
176 yet.
177
178 >>> When it does you'll need to disable COW for them (using chattr)
179 >>> otherwise they'll be fragmented until your system grinds to a halt. A
180 >>> swap file is about the worst case scenario for any COW filesystem -
181 >>> I'm not sure how ZFS handles them.
182 >>
183 >> Well, then they need to make special provisions for swap files in btrfs
184 >> so that we can finally get rid of the swap partitions.
185 >
186 > I'm sure they'll happily accept patches. :)
187
188 I'm sure they won't. The thing is that like everyone says they
189 appreciate contributions, bug reports and patches while they make it
190 more or less impossible to contribute or show no interest in getting
191 contributions, don't look at bug reports or close them automatically and
192 prematurely or show a great deal of disinterest in them and decline any
193 patches should you have ventured to provide some.
194
195 You'd be misguided to think that anyone cares about or wants your
196 contribution. If you make one, you're making it only for yourself. I
197 don't even make bug reports anymore because it's useless.
198
199 >>> If I had done that in the past I think I would have completely avoided
200 >>> that issue that required me to restore from backups. That happened in
201 >>> the 3.15/3.16 timeframe and I'd have never even run those kernels.
202 >>> They were stable kernels at the time, and a few versions in when I
203 >>> switched to them (I was probably just following gentoo-sources stable
204 >>> keywords back then), but they still had regressions (fixes were
205 >>> eventually backported).
206 >>
207 >> How do you know if an old kernel you pick because you think the btrfs
208 >> part works well enough is the right pick? You can either encounter a
209 >> bug that has been fixed or a regression that hasn't been
210 >> discovered/fixed yet. That way, you can't win.
211 >
212 > You read the lists closely. If you want to be bleeding-edge it will
213 > take more work than if you just go with the flow. That's why I'm not
214 > on 4.1 yet - I read the lists and am not quite sure they're ready yet.
215
216 That sounds like a lot of work. You seem to really be going to lengths
217 to use btrfs.
218
219 >>> I think btrfs is certainly usable today, though I'd be hesitant to run
220 >>> it on production servers depending on the use case (I'd be looking for
221 >>> a use case that actually has a significant benefit from using btrfs,
222 >>> and which somehow mitigates the risks).
223 >>
224 >> There you go, it's usable, and the risk of using it is too high.
225 >
226 > That is a judgement that everybody has to make based on their
227 > requirements. The important thing is to make an informed decision. I
228 > don't get paid if you pick btrfs.
229
230 Being more informed doesn't magically result in better decisions.
231 Information, as is knowledge, is volatile and fluid; software is power,
232 while making decisions is only a freedom.
233
234 >>> Right now I keep a daily rsnapshot (rsync on steroids - it's in the
235 >>> Gentoo repo) backup of my btrfs filesystems on ext4. I occasionally
236 >>> debate whether I still need it, but I sleep better knowing I have it.
237 >>> This is in addition to my daily duplicity cloud backups of my most
238 >>> important data (so, /etc and /home are in the cloud, and mythtv's
239 >>> /var/video is just on a local rsync backup).
240 >>
241 >> I wouldn't give my data out of my hands.
242 >
243 > Somehow I doubt the folks at Amazon are going to break RSA anytime soon.
244
245 Which means?
246
247 >> Snapper? I've never heard of that ...
248 >>
249 >
250 > http://snapper.io/
251 >
252 > Basically snapshots+crontab and some wrappers to set retention
253 > policies and such. That and some things like package-manager plugins
254 > so that you get snapshots before you install stuff.
255
256 Does this make things easier or more complicated? Like I fail to
257 understand what's supposed to be so great about zfs incremental
258 snapshots to get backups. Apparently you'd have to pile up an
259 indefinite amount of snapshots so you can increment them indefinitely.
260 And it gets extremely scary when you want to remove them to get back to
261 something sane.
262
263 >> Queuing up the data when there's more data than the system can deal with
264 >> only works when the system has sufficient time to catch up with the
265 >> queue. Otherwise, you have to block something at some point, or you
266 >> must drop the data. At that point, it doesn't matter how you arrange
267 >> the contents of the queue within it.
268 >
269 > Absolutely true. You need to throttle the data before it gets into
270 > the queue, so that the business of the queue is exposed to the
271 > applications so that they behave appropriately (falling back to
272 > lower-bandwidth alternatives, etc). In my case if mythtv's write
273 > buffers are filling up and I'm also running an emerge install phase
274 > the correct answer (per ionice) is for emerge to block so that my
275 > realtime video capture buffers are safely flushed. What you don't
276 > want is for the kernel to let emerge dump a few GB of low-priority
277 > data into the write cache alongside my 5Mbps HD recording stream.
278 > Granted, it isn't as big a problem as it used to be now that RAM sizes
279 > have increased.
280
281 You could re-arrange the queue, and when it's long enough, you don't
282 need to freeze anything. But what does, for example, a web browser do
283 when it cannot receive data as fast as it can display it, or what does a
284 VOIP application do when it cannot send the data as fast as it wants to?
285 I don't want my web browser to freeze, and a speaker whose voice is
286 supposed to be transmitted over a network cannot be frozen in their
287 speech to give sufficient time for the queue to become empty.
288
289 >> Gentoo /is/ fire-and-forget in that it works fine. Btrfs is not in that
290 >> it may work or not.
291 >>
292 >
293 > Well, we certainly must have come a long way then. :) I still
294 > remember the last time the glibc ABI changed and I was basically
295 > rebuilding everything from single-user mode holding my breath.
296
297 Did it work?