1 |
On Fri, Jan 1, 2016 at 5:42 AM, lee <lee@××××××××.de> wrote: |
2 |
> "Stefan G. Weichinger" <lists@×××××.at> writes: |
3 |
> |
4 |
>> btrfs offers RAID-like redundancy as well, no mdadm involved here. |
5 |
>> |
6 |
>> The general recommendation now is to stay at level-1 for now. That fits |
7 |
>> your 2-disk-situation. |
8 |
> |
9 |
> Well, what shows better performance? No btrfs-raid on hardware raid or |
10 |
> btrfs raid on JBOD? |
11 |
|
12 |
I would run btrfs on bare partitions and use btrfs's raid1 |
13 |
capabilities. You're almost certainly going to get better |
14 |
performance, and you get more data integrity features. If you have a |
15 |
silent corruption with mdadm doing the raid1 then btrfs will happily |
16 |
warn you of your problem and you're going to have a really hard time |
17 |
fixing it, because btrfs only sees one copy of the data which is bad, |
18 |
and all mdadm can tell you is that the data is inconsistent with no |
19 |
idea which one is right. You'd end up having to try to manipulate the |
20 |
underlying data to figure out which one is right and fix it (the data |
21 |
is all there, but you'd probably end up hex-editing your disks). If |
22 |
you were using btrfs raid1 you'd just run a scrub and it would |
23 |
detect/fix the problem, since btrfs would see both copies and know |
24 |
which one is right. Then if you ever move to raid5 when that matures |
25 |
you eliminate the write hole with btrfs. |
26 |
|
27 |
>> |
28 |
>> I would avoid converting and stuff. |
29 |
>> |
30 |
>> Why not try a fresh install on the new disks with btrfs? |
31 |
> |
32 |
> Why would I want to spend another year to get back to where I'm now? |
33 |
|
34 |
I wouldn't do a fresh install. I'd just set up btrfs on the new disks |
35 |
and copy your data over (preserving attributes/etc). Before I did |
36 |
that I'd create any subvolumes you want to have on the new disks and |
37 |
copy the data into them. The only way to convert a directory into a |
38 |
subvolume after the fact is to create a subvolume with the new name, |
39 |
copy the directory into it, and then rename the directory and |
40 |
subvolume to swap their names, then delete the old directory. That is |
41 |
time-consuming, and depending on what directory you're talking about |
42 |
you might want to be in single-user or boot from a rescue disk to do |
43 |
it. |
44 |
|
45 |
I wouldn't do an in-place ext4->btrfs conversion. I know that there |
46 |
were some regressions in that feature recently and I'm not sure where |
47 |
it stands right now. |
48 |
|
49 |
>> I never had /boot on btrfs so far, maybe others can guide you with this. |
50 |
>> |
51 |
>> My /boot is plain extX on maybe RAID1 (differs on |
52 |
>> laptops/desktop/servers), I size it 500 MB to have space for multiple |
53 |
>> kernels (especially on dualboot-systems). |
54 |
>> |
55 |
>> Then some swap-partitions, and the rest for btrfs. |
56 |
> |
57 |
> There you go, you end up with an odd setup. I don't like /boot |
58 |
> partitions. As well as swap partitions, they need to be on raid. So |
59 |
> unless you use hardware raid, you end up with mdadm /and/ btrfs /and/ |
60 |
> perhaps ext4, /and/ multiple partitions. |
61 |
|
62 |
With grub2 you can boot from btrfs. I used to use a separate boot |
63 |
partition on ext4 with btrfs for the rest, but now my /boot is on my |
64 |
root partition. I'd still partition space for a boot partition in |
65 |
case you move to EFI in the future but I wouldn't bother formatting it |
66 |
or setting it up right now. As long as you're using grub2 you really |
67 |
don't need to do anything special. |
68 |
|
69 |
You DO need to partition your disks though, even if you only have one |
70 |
big partition for the whole thing. The reason is that this gives |
71 |
space for grub to stick its loaders/etc on the disk. |
72 |
|
73 |
I don't use swap. If I did I'd probably set up an mdadm array for it. |
74 |
According to the FAQ btrfs still doesn't support swap from a file. |
75 |
|
76 |
There isn't really anything painful about that setup though. Swap |
77 |
isn't needed to boot, so openrc/systemd will start up mdadm and |
78 |
activate your swap. I'm not sure if dracut will do that during early |
79 |
boot or not, but it doesn't really matter if it does. |
80 |
|
81 |
If you have two drives I'd just set them up as: |
82 |
sd[ab]1 - 1GB boot partition unformatted for future EFI |
83 |
sd[ab]2 - mdadm raid1 for swap |
84 |
sd[ab]3 - btrfs |
85 |
|
86 |
|
87 |
> When you use hardware raid, it |
88 |
> can be disadvantageous compared to btrfs-raid --- and when you use it |
89 |
> anyway, things are suddenly much more straightforward because everything |
90 |
> is on raid to begin with. |
91 |
|
92 |
I'd stick with mdadm. You're never going to run mixed |
93 |
btrfs/hardware-raid on a single drive, and the only time I'd consider |
94 |
hardware raid is with a high quality raid card. You'd still have to |
95 |
convince me not to use mdadm even if I had one of those lying around. |
96 |
|
97 |
>> Create your btrfs-"pool" with: |
98 |
>> |
99 |
>> # mkfs.btrfs -m raid1 -d raid1 /dev/sda3 /dev/sdb3 |
100 |
>> |
101 |
>> Then check for your btrfs-fs with: |
102 |
>> |
103 |
>> # btrfs fi show |
104 |
>> |
105 |
>> Oh: I realize that I start writing a howto here ;-) |
106 |
> |
107 |
> That doesn't work without an extra /boot partition? |
108 |
|
109 |
It works fine without a boot partition if you're using grub2. If you |
110 |
want to use grub legacy you'll need a boot partition. |
111 |
|
112 |
> |
113 |
> How's btrfs's performance when you use swap files instead of swap |
114 |
> partitions to avoid the need for mdadm? |
115 |
|
116 |
btrfs does not support swap files at present. When it does you'll |
117 |
need to disable COW for them (using chattr) otherwise they'll be |
118 |
fragmented until your system grinds to a halt. A swap file is about |
119 |
the worst case scenario for any COW filesystem - I'm not sure how ZFS |
120 |
handles them. |
121 |
|
122 |
> |
123 |
> Now I understand that it's apparently not possible to simply make a |
124 |
> btrfs-raid1 from the two raw disks, copy the system over, install grub |
125 |
> and boot from that. (I could live with swap files instead of swap |
126 |
> partitions.) |
127 |
|
128 |
Even if you used no swap and no boot like I have right now, you'd |
129 |
still want to create a single large partition for better grub2 |
130 |
support. Without space between the partition table and the first |
131 |
partition (which you'll want to start at 2048 or whatever the default |
132 |
is these days) it has to resort to blocklists. That means that if for |
133 |
any reason the files in /boot/grub move on disk your system won't |
134 |
boot. That isn't a btrfs thing - it holds just as true if you're |
135 |
using ext4 and is generally frowned upon. |
136 |
|
137 |
> |
138 |
>> As mentioned here several times I am using btrfs on >6 of my systems for |
139 |
>> years now. And I don't look back so far. |
140 |
> |
141 |
> And has it always been reliable? |
142 |
> |
143 |
|
144 |
I've never had an episode that resulted in actual data loss. I HAVE |
145 |
had an episode or two which resulted in downtime. |
146 |
|
147 |
When I've had btrfs issues I can generally mount the filesystem |
148 |
read-only just fine. The problem was that cleanup threads were |
149 |
causing kernel BUGs which cause the filesystem to stop syncing (not a |
150 |
full panic, but when all your filesystems are effectively read-only |
151 |
there isn't much difference in many cases). If I rebooted the system |
152 |
would BUG within a few minutes. In one case I was able to boot from a |
153 |
more recent kernel on a rescue disk and fix things by just mounting |
154 |
the drive and letting it sit for 20min to finish cleaning things up |
155 |
while the disk was otherwise idle (some kind of locking issue most |
156 |
likely) - maybe I had to run btrfsck on it. In the other case it was |
157 |
being really fussy and I ended up just restoring from a backup since |
158 |
that was the path of least resistance. I could have probably |
159 |
eventually fixed the problem, and the drive was mountable read-only |
160 |
the entire time so given sufficient space I could have copied all the |
161 |
data over to a new filesystem with no loss at all. |
162 |
|
163 |
Things have been pretty quiet for the last six months though, and I |
164 |
think it is largely due to a change in strategy around kernel |
165 |
versions. Right now I'm running 3.18. I'm starting to consider a |
166 |
move to 4.1, but there is a backlog of btrfs fixes for stable that I'm |
167 |
waiting for Greg to catch up on and maybe I'll wait for a version |
168 |
after that to see if things settle down. Around the time of |
169 |
3.14->3.18 btrfs maturity seemed to settle in a bit, and at this point |
170 |
I think newer kernels are more likely to introduce regressions than |
171 |
fix problems. The pace of btrfs patching seems to have increased as |
172 |
well in the last year (which is good in the long-term - most are |
173 |
bugfixes - but in the short term even bugfixes can introduce bugs). |
174 |
Unless I have a reason not to at this point I plan to run only |
175 |
longterm kernels, and move to them when they're about six months |
176 |
mature. |
177 |
|
178 |
If I had done that in the past I think I would have completely avoided |
179 |
that issue that required me to restore from backups. That happened in |
180 |
the 3.15/3.16 timeframe and I'd have never even run those kernels. |
181 |
They were stable kernels at the time, and a few versions in when I |
182 |
switched to them (I was probably just following gentoo-sources stable |
183 |
keywords back then), but they still had regressions (fixes were |
184 |
eventually backported). |
185 |
|
186 |
I think btrfs is certainly usable today, though I'd be hesitant to run |
187 |
it on production servers depending on the use case (I'd be looking for |
188 |
a use case that actually has a significant benefit from using btrfs, |
189 |
and which somehow mitigates the risks). |
190 |
|
191 |
Right now I keep a daily rsnapshot (rsync on steroids - it's in the |
192 |
Gentoo repo) backup of my btrfs filesystems on ext4. I occasionally |
193 |
debate whether I still need it, but I sleep better knowing I have it. |
194 |
This is in addition to my daily duplicity cloud backups of my most |
195 |
important data (so, /etc and /home are in the cloud, and mythtv's |
196 |
/var/video is just on a local rsync backup). |
197 |
|
198 |
Oh, and don't go anywhere near btrfs raid5/6 (btrfs on top of mdadm |
199 |
raid5/6 is fine, but you lose the data integrity features). I |
200 |
wouldn't go anywhere near that for at least a year, and probably |
201 |
longer. |
202 |
|
203 |
Overall I'm very happy with btrfs though. Snapshots and reflinks are |
204 |
very handy - I can update containers and nfs roots after snapshotting |
205 |
them and it gives me a trivial rollback solution, and while I don't |
206 |
use snapper I do manually rotate through snapshots weekly. If you do |
207 |
run snapper I'd probably avoid generating large numbers of snapshots - |
208 |
one of my BUG problems happened as a result of snapper deleting a few |
209 |
hundred snapshots at once. |
210 |
|
211 |
Btrfs's deferred processing of the log/btrees can cause the kinds of |
212 |
performance issues associated with garbage collection (or BUGs due to |
213 |
thundering herd problems). I use ionice to try to prioritize my IO so |
214 |
that stuff like mythtv recordings will block less realtime activities, |
215 |
and in the past that hasn't always worked with btrfs. The problem is |
216 |
that btrfs would accept too much data into its log, and then it would |
217 |
block all writes while it tried to catch up. I haven't seen that as |
218 |
much recently, so maybe they're getting better about that. As with |
219 |
any other scheduling problem it only works if you correctly block |
220 |
writes into the start of the pipeline (I've heard of similar problems |
221 |
with TCP QoS and such if you don't ensure that the bottleneck is the |
222 |
first router along the route - you can let in too much low-priority |
223 |
traffic and then at that point you're stuck dealing with it). |
224 |
|
225 |
I'd suggest looking at the btrfs mailing list to get a survey for what |
226 |
people are dealing with. Just ignore all the threads marked as |
227 |
patches and look at the discussion threads. |
228 |
|
229 |
If you're getting the impression that btrfs isn't quite |
230 |
fire-and-forget, you're getting the right impression. Neither is |
231 |
Gentoo, so I wouldn't let that alone scare you off. But, I see no |
232 |
reason to not give you fair warning. |
233 |
|
234 |
-- |
235 |
Rich |