Gentoo Archives: gentoo-user

From: Rich Freeman <rich0@g.o>
To: gentoo-user@l.g.o
Subject: Re: [gentoo-user] snapshots?
Date: Fri, 01 Jan 2016 13:26:47
Message-Id: CAGfcS_nDON83L83vKoyMCdBD0V3zeJVk80ZXV0-=2gjh-MOcgQ@mail.gmail.com
In Reply to: Re: [gentoo-user] snapshots? by lee
1 On Fri, Jan 1, 2016 at 5:42 AM, lee <lee@××××××××.de> wrote:
2 > "Stefan G. Weichinger" <lists@×××××.at> writes:
3 >
4 >> btrfs offers RAID-like redundancy as well, no mdadm involved here.
5 >>
6 >> The general recommendation now is to stay at level-1 for now. That fits
7 >> your 2-disk-situation.
8 >
9 > Well, what shows better performance? No btrfs-raid on hardware raid or
10 > btrfs raid on JBOD?
11
12 I would run btrfs on bare partitions and use btrfs's raid1
13 capabilities. You're almost certainly going to get better
14 performance, and you get more data integrity features. If you have a
15 silent corruption with mdadm doing the raid1 then btrfs will happily
16 warn you of your problem and you're going to have a really hard time
17 fixing it, because btrfs only sees one copy of the data which is bad,
18 and all mdadm can tell you is that the data is inconsistent with no
19 idea which one is right. You'd end up having to try to manipulate the
20 underlying data to figure out which one is right and fix it (the data
21 is all there, but you'd probably end up hex-editing your disks). If
22 you were using btrfs raid1 you'd just run a scrub and it would
23 detect/fix the problem, since btrfs would see both copies and know
24 which one is right. Then if you ever move to raid5 when that matures
25 you eliminate the write hole with btrfs.
26
27 >>
28 >> I would avoid converting and stuff.
29 >>
30 >> Why not try a fresh install on the new disks with btrfs?
31 >
32 > Why would I want to spend another year to get back to where I'm now?
33
34 I wouldn't do a fresh install. I'd just set up btrfs on the new disks
35 and copy your data over (preserving attributes/etc). Before I did
36 that I'd create any subvolumes you want to have on the new disks and
37 copy the data into them. The only way to convert a directory into a
38 subvolume after the fact is to create a subvolume with the new name,
39 copy the directory into it, and then rename the directory and
40 subvolume to swap their names, then delete the old directory. That is
41 time-consuming, and depending on what directory you're talking about
42 you might want to be in single-user or boot from a rescue disk to do
43 it.
44
45 I wouldn't do an in-place ext4->btrfs conversion. I know that there
46 were some regressions in that feature recently and I'm not sure where
47 it stands right now.
48
49 >> I never had /boot on btrfs so far, maybe others can guide you with this.
50 >>
51 >> My /boot is plain extX on maybe RAID1 (differs on
52 >> laptops/desktop/servers), I size it 500 MB to have space for multiple
53 >> kernels (especially on dualboot-systems).
54 >>
55 >> Then some swap-partitions, and the rest for btrfs.
56 >
57 > There you go, you end up with an odd setup. I don't like /boot
58 > partitions. As well as swap partitions, they need to be on raid. So
59 > unless you use hardware raid, you end up with mdadm /and/ btrfs /and/
60 > perhaps ext4, /and/ multiple partitions.
61
62 With grub2 you can boot from btrfs. I used to use a separate boot
63 partition on ext4 with btrfs for the rest, but now my /boot is on my
64 root partition. I'd still partition space for a boot partition in
65 case you move to EFI in the future but I wouldn't bother formatting it
66 or setting it up right now. As long as you're using grub2 you really
67 don't need to do anything special.
68
69 You DO need to partition your disks though, even if you only have one
70 big partition for the whole thing. The reason is that this gives
71 space for grub to stick its loaders/etc on the disk.
72
73 I don't use swap. If I did I'd probably set up an mdadm array for it.
74 According to the FAQ btrfs still doesn't support swap from a file.
75
76 There isn't really anything painful about that setup though. Swap
77 isn't needed to boot, so openrc/systemd will start up mdadm and
78 activate your swap. I'm not sure if dracut will do that during early
79 boot or not, but it doesn't really matter if it does.
80
81 If you have two drives I'd just set them up as:
82 sd[ab]1 - 1GB boot partition unformatted for future EFI
83 sd[ab]2 - mdadm raid1 for swap
84 sd[ab]3 - btrfs
85
86
87 > When you use hardware raid, it
88 > can be disadvantageous compared to btrfs-raid --- and when you use it
89 > anyway, things are suddenly much more straightforward because everything
90 > is on raid to begin with.
91
92 I'd stick with mdadm. You're never going to run mixed
93 btrfs/hardware-raid on a single drive, and the only time I'd consider
94 hardware raid is with a high quality raid card. You'd still have to
95 convince me not to use mdadm even if I had one of those lying around.
96
97 >> Create your btrfs-"pool" with:
98 >>
99 >> # mkfs.btrfs -m raid1 -d raid1 /dev/sda3 /dev/sdb3
100 >>
101 >> Then check for your btrfs-fs with:
102 >>
103 >> # btrfs fi show
104 >>
105 >> Oh: I realize that I start writing a howto here ;-)
106 >
107 > That doesn't work without an extra /boot partition?
108
109 It works fine without a boot partition if you're using grub2. If you
110 want to use grub legacy you'll need a boot partition.
111
112 >
113 > How's btrfs's performance when you use swap files instead of swap
114 > partitions to avoid the need for mdadm?
115
116 btrfs does not support swap files at present. When it does you'll
117 need to disable COW for them (using chattr) otherwise they'll be
118 fragmented until your system grinds to a halt. A swap file is about
119 the worst case scenario for any COW filesystem - I'm not sure how ZFS
120 handles them.
121
122 >
123 > Now I understand that it's apparently not possible to simply make a
124 > btrfs-raid1 from the two raw disks, copy the system over, install grub
125 > and boot from that. (I could live with swap files instead of swap
126 > partitions.)
127
128 Even if you used no swap and no boot like I have right now, you'd
129 still want to create a single large partition for better grub2
130 support. Without space between the partition table and the first
131 partition (which you'll want to start at 2048 or whatever the default
132 is these days) it has to resort to blocklists. That means that if for
133 any reason the files in /boot/grub move on disk your system won't
134 boot. That isn't a btrfs thing - it holds just as true if you're
135 using ext4 and is generally frowned upon.
136
137 >
138 >> As mentioned here several times I am using btrfs on >6 of my systems for
139 >> years now. And I don't look back so far.
140 >
141 > And has it always been reliable?
142 >
143
144 I've never had an episode that resulted in actual data loss. I HAVE
145 had an episode or two which resulted in downtime.
146
147 When I've had btrfs issues I can generally mount the filesystem
148 read-only just fine. The problem was that cleanup threads were
149 causing kernel BUGs which cause the filesystem to stop syncing (not a
150 full panic, but when all your filesystems are effectively read-only
151 there isn't much difference in many cases). If I rebooted the system
152 would BUG within a few minutes. In one case I was able to boot from a
153 more recent kernel on a rescue disk and fix things by just mounting
154 the drive and letting it sit for 20min to finish cleaning things up
155 while the disk was otherwise idle (some kind of locking issue most
156 likely) - maybe I had to run btrfsck on it. In the other case it was
157 being really fussy and I ended up just restoring from a backup since
158 that was the path of least resistance. I could have probably
159 eventually fixed the problem, and the drive was mountable read-only
160 the entire time so given sufficient space I could have copied all the
161 data over to a new filesystem with no loss at all.
162
163 Things have been pretty quiet for the last six months though, and I
164 think it is largely due to a change in strategy around kernel
165 versions. Right now I'm running 3.18. I'm starting to consider a
166 move to 4.1, but there is a backlog of btrfs fixes for stable that I'm
167 waiting for Greg to catch up on and maybe I'll wait for a version
168 after that to see if things settle down. Around the time of
169 3.14->3.18 btrfs maturity seemed to settle in a bit, and at this point
170 I think newer kernels are more likely to introduce regressions than
171 fix problems. The pace of btrfs patching seems to have increased as
172 well in the last year (which is good in the long-term - most are
173 bugfixes - but in the short term even bugfixes can introduce bugs).
174 Unless I have a reason not to at this point I plan to run only
175 longterm kernels, and move to them when they're about six months
176 mature.
177
178 If I had done that in the past I think I would have completely avoided
179 that issue that required me to restore from backups. That happened in
180 the 3.15/3.16 timeframe and I'd have never even run those kernels.
181 They were stable kernels at the time, and a few versions in when I
182 switched to them (I was probably just following gentoo-sources stable
183 keywords back then), but they still had regressions (fixes were
184 eventually backported).
185
186 I think btrfs is certainly usable today, though I'd be hesitant to run
187 it on production servers depending on the use case (I'd be looking for
188 a use case that actually has a significant benefit from using btrfs,
189 and which somehow mitigates the risks).
190
191 Right now I keep a daily rsnapshot (rsync on steroids - it's in the
192 Gentoo repo) backup of my btrfs filesystems on ext4. I occasionally
193 debate whether I still need it, but I sleep better knowing I have it.
194 This is in addition to my daily duplicity cloud backups of my most
195 important data (so, /etc and /home are in the cloud, and mythtv's
196 /var/video is just on a local rsync backup).
197
198 Oh, and don't go anywhere near btrfs raid5/6 (btrfs on top of mdadm
199 raid5/6 is fine, but you lose the data integrity features). I
200 wouldn't go anywhere near that for at least a year, and probably
201 longer.
202
203 Overall I'm very happy with btrfs though. Snapshots and reflinks are
204 very handy - I can update containers and nfs roots after snapshotting
205 them and it gives me a trivial rollback solution, and while I don't
206 use snapper I do manually rotate through snapshots weekly. If you do
207 run snapper I'd probably avoid generating large numbers of snapshots -
208 one of my BUG problems happened as a result of snapper deleting a few
209 hundred snapshots at once.
210
211 Btrfs's deferred processing of the log/btrees can cause the kinds of
212 performance issues associated with garbage collection (or BUGs due to
213 thundering herd problems). I use ionice to try to prioritize my IO so
214 that stuff like mythtv recordings will block less realtime activities,
215 and in the past that hasn't always worked with btrfs. The problem is
216 that btrfs would accept too much data into its log, and then it would
217 block all writes while it tried to catch up. I haven't seen that as
218 much recently, so maybe they're getting better about that. As with
219 any other scheduling problem it only works if you correctly block
220 writes into the start of the pipeline (I've heard of similar problems
221 with TCP QoS and such if you don't ensure that the bottleneck is the
222 first router along the route - you can let in too much low-priority
223 traffic and then at that point you're stuck dealing with it).
224
225 I'd suggest looking at the btrfs mailing list to get a survey for what
226 people are dealing with. Just ignore all the threads marked as
227 patches and look at the discussion threads.
228
229 If you're getting the impression that btrfs isn't quite
230 fire-and-forget, you're getting the right impression. Neither is
231 Gentoo, so I wouldn't let that alone scare you off. But, I see no
232 reason to not give you fair warning.
233
234 --
235 Rich

Replies

Subject Author
Re: [gentoo-user] snapshots? lee <lee@××××××××.de>