1 |
Rich Freeman <rich0@g.o> writes: |
2 |
|
3 |
> On Fri, Jan 1, 2016 at 5:42 AM, lee <lee@××××××××.de> wrote: |
4 |
>> "Stefan G. Weichinger" <lists@×××××.at> writes: |
5 |
>> |
6 |
>>> btrfs offers RAID-like redundancy as well, no mdadm involved here. |
7 |
>>> |
8 |
>>> The general recommendation now is to stay at level-1 for now. That fits |
9 |
>>> your 2-disk-situation. |
10 |
>> |
11 |
>> Well, what shows better performance? No btrfs-raid on hardware raid or |
12 |
>> btrfs raid on JBOD? |
13 |
> |
14 |
> I would run btrfs on bare partitions and use btrfs's raid1 |
15 |
> capabilities. You're almost certainly going to get better |
16 |
> performance, and you get more data integrity features. |
17 |
|
18 |
That would require me to set up software raid with mdadm as well, for |
19 |
the swap partition. |
20 |
|
21 |
> If you have a silent corruption with mdadm doing the raid1 then btrfs |
22 |
> will happily warn you of your problem and you're going to have a |
23 |
> really hard time fixing it, |
24 |
|
25 |
BTW, what do you do when you have silent corruption on a swap partition? |
26 |
Is that possible, or does swapping use its own checksums? |
27 |
|
28 |
> [...] |
29 |
> |
30 |
>>> |
31 |
>>> I would avoid converting and stuff. |
32 |
>>> |
33 |
>>> Why not try a fresh install on the new disks with btrfs? |
34 |
>> |
35 |
>> Why would I want to spend another year to get back to where I'm now? |
36 |
> |
37 |
> I wouldn't do a fresh install. I'd just set up btrfs on the new disks |
38 |
> and copy your data over (preserving attributes/etc). |
39 |
|
40 |
That was the idea. |
41 |
|
42 |
> I wouldn't do an in-place ext4->btrfs conversion. I know that there |
43 |
> were some regressions in that feature recently and I'm not sure where |
44 |
> it stands right now. |
45 |
|
46 |
That adds to the uncertainty of btrfs. |
47 |
|
48 |
|
49 |
> [...] |
50 |
>> |
51 |
>> There you go, you end up with an odd setup. I don't like /boot |
52 |
>> partitions. As well as swap partitions, they need to be on raid. So |
53 |
>> unless you use hardware raid, you end up with mdadm /and/ btrfs /and/ |
54 |
>> perhaps ext4, /and/ multiple partitions. |
55 |
> |
56 |
> [...] |
57 |
> There isn't really anything painful about that setup though. |
58 |
|
59 |
It's still odd. I already have two different file systems and the |
60 |
overhead of one kind of software raid while I would rather stick to one |
61 |
file system. With btrfs, I'd still have two different file systems --- |
62 |
plus mdadm and the overhead of three different kinds of software raid. |
63 |
|
64 |
How would it be so much better to triple the software raids and to still |
65 |
have the same number of file systems? |
66 |
|
67 |
>> When you use hardware raid, it |
68 |
>> can be disadvantageous compared to btrfs-raid --- and when you use it |
69 |
>> anyway, things are suddenly much more straightforward because everything |
70 |
>> is on raid to begin with. |
71 |
> |
72 |
> I'd stick with mdadm. You're never going to run mixed |
73 |
> btrfs/hardware-raid on a single drive, |
74 |
|
75 |
A single disk doesn't make for a raid. |
76 |
|
77 |
> and the only time I'd consider |
78 |
> hardware raid is with a high quality raid card. You'd still have to |
79 |
> convince me not to use mdadm even if I had one of those lying around. |
80 |
|
81 |
From my own experience, I can tell you that mdadm already does have |
82 |
significant overhead when you use a raid1 of two disks and a raid5 with |
83 |
three disks. This overhead may be somewhat due to the SATA controller |
84 |
not being as capable as one would expect --- yet that doesn't matter |
85 |
because one thing you're looking at, besides reliability, is the overall |
86 |
performance. And the overall performance very noticeably increased when |
87 |
I migrated from mdadm raids to hardware raids, with the same disks and |
88 |
the same hardware, except that the raid card was added. |
89 |
|
90 |
And that was only 5 disks. I also know that the performance with a ZFS |
91 |
mirror with two disks was disappointingly poor. Those disks aren't |
92 |
exactly fast, but still. I haven't tested yet if it changed after |
93 |
adding 4 mirrored disks to the pool. And I know that the performance of |
94 |
another hardware raid5 with 6 disks was very good. |
95 |
|
96 |
Thus I'm not convinced that software raid is the way to go. I wish they |
97 |
would make hardware ZFS (or btrfs, if it ever becomes reliable) |
98 |
controllers. |
99 |
|
100 |
Now consider: |
101 |
|
102 |
|
103 |
+ candidates for hardware raid are two small disks (72GB each) |
104 |
+ data on those is either mostly read, or temporary/cache-like |
105 |
+ this setup works without any issues for over a year now |
106 |
+ using btrfs would triple the software raids used |
107 |
+ btrfs is uncertain, reliability questionable |
108 |
+ mdadm would have to be added as another layer of complexity |
109 |
+ the disks are SAS disks, genuinely made to be run in a hardware raid |
110 |
+ the setup with hardware raid is straightforward and simple, the setup |
111 |
with btrfs is anything but |
112 |
|
113 |
|
114 |
The relevant advantage of btrfs is being able to make snapshots. Is |
115 |
that worth all the (potential) trouble? Snapshots are worthless when |
116 |
the file system destroys them with the rest of the data. |
117 |
|
118 |
> [...] |
119 |
>> How's btrfs's performance when you use swap files instead of swap |
120 |
>> partitions to avoid the need for mdadm? |
121 |
> |
122 |
> btrfs does not support swap files at present. |
123 |
|
124 |
What happens when you try it? |
125 |
|
126 |
> When it does you'll need to disable COW for them (using chattr) |
127 |
> otherwise they'll be fragmented until your system grinds to a halt. A |
128 |
> swap file is about the worst case scenario for any COW filesystem - |
129 |
> I'm not sure how ZFS handles them. |
130 |
|
131 |
Well, then they need to make special provisions for swap files in btrfs |
132 |
so that we can finally get rid of the swap partitions. |
133 |
|
134 |
|
135 |
> [...] |
136 |
>>> As mentioned here several times I am using btrfs on >6 of my systems for |
137 |
>>> years now. And I don't look back so far. |
138 |
>> |
139 |
>> And has it always been reliable? |
140 |
>> |
141 |
> |
142 |
> I've never had an episode that resulted in actual data loss. I HAVE |
143 |
> had an episode or two which resulted in downtime. |
144 |
> |
145 |
> When I've had btrfs issues I can generally mount the filesystem |
146 |
> read-only just fine. The problem was that cleanup threads were |
147 |
> causing kernel BUGs which cause the filesystem to stop syncing (not a |
148 |
> full panic, but when all your filesystems are effectively read-only |
149 |
> there isn't much difference in many cases). If I rebooted the system |
150 |
> would BUG within a few minutes. In one case I was able to boot from a |
151 |
> more recent kernel on a rescue disk and fix things by just mounting |
152 |
> the drive and letting it sit for 20min to finish cleaning things up |
153 |
> while the disk was otherwise idle (some kind of locking issue most |
154 |
> likely) - maybe I had to run btrfsck on it. In the other case it was |
155 |
> being really fussy and I ended up just restoring from a backup since |
156 |
> that was the path of least resistance. I could have probably |
157 |
> eventually fixed the problem, and the drive was mountable read-only |
158 |
> the entire time so given sufficient space I could have copied all the |
159 |
> data over to a new filesystem with no loss at all. |
160 |
|
161 |
That's exactly what I don't want to have to deal with. It would defeat |
162 |
the most important purpose of using raid. |
163 |
|
164 |
> Things have been pretty quiet for the last six months though, and I |
165 |
> think it is largely due to a change in strategy around kernel |
166 |
> versions. Right now I'm running 3.18. I'm starting to consider a |
167 |
> move to 4.1, but there is a backlog of btrfs fixes for stable that I'm |
168 |
> waiting for Greg to catch up on and maybe I'll wait for a version |
169 |
> after that to see if things settle down. Around the time of |
170 |
> 3.14->3.18 btrfs maturity seemed to settle in a bit, and at this point |
171 |
> I think newer kernels are more likely to introduce regressions than |
172 |
> fix problems. The pace of btrfs patching seems to have increased as |
173 |
> well in the last year (which is good in the long-term - most are |
174 |
> bugfixes - but in the short term even bugfixes can introduce bugs). |
175 |
> Unless I have a reason not to at this point I plan to run only |
176 |
> longterm kernels, and move to them when they're about six months |
177 |
> mature. |
178 |
|
179 |
That's another thing making it difficult to use btrfs. |
180 |
|
181 |
> If I had done that in the past I think I would have completely avoided |
182 |
> that issue that required me to restore from backups. That happened in |
183 |
> the 3.15/3.16 timeframe and I'd have never even run those kernels. |
184 |
> They were stable kernels at the time, and a few versions in when I |
185 |
> switched to them (I was probably just following gentoo-sources stable |
186 |
> keywords back then), but they still had regressions (fixes were |
187 |
> eventually backported). |
188 |
|
189 |
How do you know if an old kernel you pick because you think the btrfs |
190 |
part works well enough is the right pick? You can either encounter a |
191 |
bug that has been fixed or a regression that hasn't been |
192 |
discovered/fixed yet. That way, you can't win. |
193 |
|
194 |
> I think btrfs is certainly usable today, though I'd be hesitant to run |
195 |
> it on production servers depending on the use case (I'd be looking for |
196 |
> a use case that actually has a significant benefit from using btrfs, |
197 |
> and which somehow mitigates the risks). |
198 |
|
199 |
There you go, it's usable, and the risk of using it is too high. |
200 |
|
201 |
> Right now I keep a daily rsnapshot (rsync on steroids - it's in the |
202 |
> Gentoo repo) backup of my btrfs filesystems on ext4. I occasionally |
203 |
> debate whether I still need it, but I sleep better knowing I have it. |
204 |
> This is in addition to my daily duplicity cloud backups of my most |
205 |
> important data (so, /etc and /home are in the cloud, and mythtv's |
206 |
> /var/video is just on a local rsync backup). |
207 |
|
208 |
I wouldn't give my data out of my hands. |
209 |
|
210 |
> Oh, and don't go anywhere near btrfs raid5/6 (btrfs on top of mdadm |
211 |
> raid5/6 is fine, but you lose the data integrity features). I |
212 |
> wouldn't go anywhere near that for at least a year, and probably |
213 |
> longer. |
214 |
|
215 |
It might take another 5 or 10 years before btrfs isn't questionable |
216 |
anymore, if it ever gets there. |
217 |
|
218 |
> Overall I'm very happy with btrfs though. Snapshots and reflinks are |
219 |
> very handy - I can update containers and nfs roots after snapshotting |
220 |
> them and it gives me a trivial rollback solution, and while I don't |
221 |
> use snapper I do manually rotate through snapshots weekly. If you do |
222 |
> run snapper I'd probably avoid generating large numbers of snapshots - |
223 |
> one of my BUG problems happened as a result of snapper deleting a few |
224 |
> hundred snapshots at once. |
225 |
|
226 |
Snapper? I've never heard of that ... |
227 |
|
228 |
> Btrfs's deferred processing of the log/btrees can cause the kinds of |
229 |
> performance issues associated with garbage collection (or BUGs due to |
230 |
> thundering herd problems). I use ionice to try to prioritize my IO so |
231 |
> that stuff like mythtv recordings will block less realtime activities, |
232 |
> and in the past that hasn't always worked with btrfs. The problem is |
233 |
> that btrfs would accept too much data into its log, and then it would |
234 |
> block all writes while it tried to catch up. I haven't seen that as |
235 |
> much recently, so maybe they're getting better about that. As with |
236 |
> any other scheduling problem it only works if you correctly block |
237 |
> writes into the start of the pipeline (I've heard of similar problems |
238 |
> with TCP QoS and such if you don't ensure that the bottleneck is the |
239 |
> first router along the route - you can let in too much low-priority |
240 |
> traffic and then at that point you're stuck dealing with it). |
241 |
|
242 |
Queuing up the data when there's more data than the system can deal with |
243 |
only works when the system has sufficient time to catch up with the |
244 |
queue. Otherwise, you have to block something at some point, or you |
245 |
must drop the data. At that point, it doesn't matter how you arrange |
246 |
the contents of the queue within it. |
247 |
|
248 |
> I'd suggest looking at the btrfs mailing list to get a survey for what |
249 |
> people are dealing with. Just ignore all the threads marked as |
250 |
> patches and look at the discussion threads. |
251 |
> |
252 |
> If you're getting the impression that btrfs isn't quite |
253 |
> fire-and-forget, you're getting the right impression. Neither is |
254 |
> Gentoo, so I wouldn't let that alone scare you off. But, I see no |
255 |
> reason to not give you fair warning. |
256 |
|
257 |
Gentoo /is/ fire-and-forget in that it works fine. Btrfs is not in that |
258 |
it may work or not. |