1 |
On Tue, Jan 5, 2016 at 5:16 PM, lee <lee@××××××××.de> wrote: |
2 |
> Rich Freeman <rich0@g.o> writes: |
3 |
> |
4 |
>> |
5 |
>> I would run btrfs on bare partitions and use btrfs's raid1 |
6 |
>> capabilities. You're almost certainly going to get better |
7 |
>> performance, and you get more data integrity features. |
8 |
> |
9 |
> That would require me to set up software raid with mdadm as well, for |
10 |
> the swap partition. |
11 |
|
12 |
Correct, if you don't want a panic if a single swap drive fails. |
13 |
|
14 |
> |
15 |
>> If you have a silent corruption with mdadm doing the raid1 then btrfs |
16 |
>> will happily warn you of your problem and you're going to have a |
17 |
>> really hard time fixing it, |
18 |
> |
19 |
> BTW, what do you do when you have silent corruption on a swap partition? |
20 |
> Is that possible, or does swapping use its own checksums? |
21 |
|
22 |
If the kernel pages in data from the good mirror, nothing happens. If |
23 |
the kernel pages in data from the bad mirror, then whatever data |
24 |
happens to be there is what will get loaded and used and/or executed. |
25 |
If you're lucky the modified data will be part of unused heap or |
26 |
something. If not, well, just about anything could happen. |
27 |
|
28 |
Nothing in this scenario will check that the data is correct, except |
29 |
for a forced scrub of the disks. A scrub would probably detect the |
30 |
error, but I don't think mdadm has any ability to recover it. Your |
31 |
best bet is probably to try to immediately reboot and save what you |
32 |
can, or a less-risky solution assuming you don't have anything |
33 |
critical in RAM is to just do an immediate hard reset so that there is |
34 |
no risk of bad data getting swapped in and overwriting good data on |
35 |
your normal filesystems. |
36 |
|
37 |
> It's still odd. I already have two different file systems and the |
38 |
> overhead of one kind of software raid while I would rather stick to one |
39 |
> file system. With btrfs, I'd still have two different file systems --- |
40 |
> plus mdadm and the overhead of three different kinds of software raid. |
41 |
|
42 |
I'm not sure why you'd need two different filesystems. Just btrfs for |
43 |
your data. I'm not sure where you're counting three types of software |
44 |
raid either - you just have your swap. And I don't think any of this |
45 |
involves any significant overhead, other than configuration. |
46 |
|
47 |
> |
48 |
> How would it be so much better to triple the software raids and to still |
49 |
> have the same number of file systems? |
50 |
|
51 |
Well, the difference would be more data integrity insofar as hardware |
52 |
failure goes, but certainly more risk of logical errors (IMO). |
53 |
|
54 |
> |
55 |
>>> When you use hardware raid, it |
56 |
>>> can be disadvantageous compared to btrfs-raid --- and when you use it |
57 |
>>> anyway, things are suddenly much more straightforward because everything |
58 |
>>> is on raid to begin with. |
59 |
>> |
60 |
>> I'd stick with mdadm. You're never going to run mixed |
61 |
>> btrfs/hardware-raid on a single drive, |
62 |
> |
63 |
> A single disk doesn't make for a raid. |
64 |
|
65 |
You misunderstood my statement. If you have two drives, you can't run |
66 |
both hardware raid and btrfs raid across them. Hardware raid setups |
67 |
don't generally support running across only part of a drive, and in |
68 |
this setup you'd have to run hardware raid on part of each of two |
69 |
single drives. |
70 |
|
71 |
> |
72 |
>> and the only time I'd consider |
73 |
>> hardware raid is with a high quality raid card. You'd still have to |
74 |
>> convince me not to use mdadm even if I had one of those lying around. |
75 |
> |
76 |
> From my own experience, I can tell you that mdadm already does have |
77 |
> significant overhead when you use a raid1 of two disks and a raid5 with |
78 |
> three disks. This overhead may be somewhat due to the SATA controller |
79 |
> not being as capable as one would expect --- yet that doesn't matter |
80 |
> because one thing you're looking at, besides reliability, is the overall |
81 |
> performance. And the overall performance very noticeably increased when |
82 |
> I migrated from mdadm raids to hardware raids, with the same disks and |
83 |
> the same hardware, except that the raid card was added. |
84 |
|
85 |
Well, sure, the raid card probably had battery-backed cache if it was |
86 |
decent, so linux could complete its commits to RAM and not have to |
87 |
wait for the disks. |
88 |
|
89 |
> |
90 |
> And that was only 5 disks. I also know that the performance with a ZFS |
91 |
> mirror with two disks was disappointingly poor. Those disks aren't |
92 |
> exactly fast, but still. I haven't tested yet if it changed after |
93 |
> adding 4 mirrored disks to the pool. And I know that the performance of |
94 |
> another hardware raid5 with 6 disks was very good. |
95 |
|
96 |
You're probably going to find the performance of a COW filesystem to |
97 |
be inferior to that of an overwrite-in-place filesystem, simply |
98 |
because the latter has to do less work. |
99 |
|
100 |
> |
101 |
> Thus I'm not convinced that software raid is the way to go. I wish they |
102 |
> would make hardware ZFS (or btrfs, if it ever becomes reliable) |
103 |
> controllers. |
104 |
|
105 |
I doubt it would perform any better. What would that controller do |
106 |
that your CPU wouldn't do? Well, other than have battery-backed |
107 |
cache, which would help in any circumstance. If you stuck 5 raid |
108 |
cards in your PC and put one drive on each card and put mdadm or ZFS |
109 |
across all five it would almost certainly perform better because |
110 |
you're adding battery-backed cache. |
111 |
|
112 |
> |
113 |
> The relevant advantage of btrfs is being able to make snapshots. Is |
114 |
> that worth all the (potential) trouble? Snapshots are worthless when |
115 |
> the file system destroys them with the rest of the data. |
116 |
|
117 |
And that is why I wouldn't use btrfs on a production system unless the |
118 |
use case mitigated this risk and there was benefit from the snapshots. |
119 |
Of course you're taking on more risk using an experimental filesystem. |
120 |
|
121 |
>> |
122 |
>> btrfs does not support swap files at present. |
123 |
> |
124 |
> What happens when you try it? |
125 |
|
126 |
No idea. Should be easy to test in a VM. I suspect either an error |
127 |
or a kernel bug/panic/etc. |
128 |
|
129 |
> |
130 |
>> When it does you'll need to disable COW for them (using chattr) |
131 |
>> otherwise they'll be fragmented until your system grinds to a halt. A |
132 |
>> swap file is about the worst case scenario for any COW filesystem - |
133 |
>> I'm not sure how ZFS handles them. |
134 |
> |
135 |
> Well, then they need to make special provisions for swap files in btrfs |
136 |
> so that we can finally get rid of the swap partitions. |
137 |
|
138 |
I'm sure they'll happily accept patches. :) |
139 |
|
140 |
> |
141 |
>> If I had done that in the past I think I would have completely avoided |
142 |
>> that issue that required me to restore from backups. That happened in |
143 |
>> the 3.15/3.16 timeframe and I'd have never even run those kernels. |
144 |
>> They were stable kernels at the time, and a few versions in when I |
145 |
>> switched to them (I was probably just following gentoo-sources stable |
146 |
>> keywords back then), but they still had regressions (fixes were |
147 |
>> eventually backported). |
148 |
> |
149 |
> How do you know if an old kernel you pick because you think the btrfs |
150 |
> part works well enough is the right pick? You can either encounter a |
151 |
> bug that has been fixed or a regression that hasn't been |
152 |
> discovered/fixed yet. That way, you can't win. |
153 |
|
154 |
You read the lists closely. If you want to be bleeding-edge it will |
155 |
take more work than if you just go with the flow. That's why I'm not |
156 |
on 4.1 yet - I read the lists and am not quite sure they're ready yet. |
157 |
|
158 |
> |
159 |
>> I think btrfs is certainly usable today, though I'd be hesitant to run |
160 |
>> it on production servers depending on the use case (I'd be looking for |
161 |
>> a use case that actually has a significant benefit from using btrfs, |
162 |
>> and which somehow mitigates the risks). |
163 |
> |
164 |
> There you go, it's usable, and the risk of using it is too high. |
165 |
|
166 |
That is a judgement that everybody has to make based on their |
167 |
requirements. The important thing is to make an informed decision. I |
168 |
don't get paid if you pick btrfs. |
169 |
|
170 |
> |
171 |
>> Right now I keep a daily rsnapshot (rsync on steroids - it's in the |
172 |
>> Gentoo repo) backup of my btrfs filesystems on ext4. I occasionally |
173 |
>> debate whether I still need it, but I sleep better knowing I have it. |
174 |
>> This is in addition to my daily duplicity cloud backups of my most |
175 |
>> important data (so, /etc and /home are in the cloud, and mythtv's |
176 |
>> /var/video is just on a local rsync backup). |
177 |
> |
178 |
> I wouldn't give my data out of my hands. |
179 |
|
180 |
Somehow I doubt the folks at Amazon are going to break RSA anytime soon. |
181 |
|
182 |
> |
183 |
> Snapper? I've never heard of that ... |
184 |
> |
185 |
|
186 |
http://snapper.io/ |
187 |
|
188 |
Basically snapshots+crontab and some wrappers to set retention |
189 |
policies and such. That and some things like package-manager plugins |
190 |
so that you get snapshots before you install stuff. |
191 |
|
192 |
> |
193 |
> Queuing up the data when there's more data than the system can deal with |
194 |
> only works when the system has sufficient time to catch up with the |
195 |
> queue. Otherwise, you have to block something at some point, or you |
196 |
> must drop the data. At that point, it doesn't matter how you arrange |
197 |
> the contents of the queue within it. |
198 |
|
199 |
Absolutely true. You need to throttle the data before it gets into |
200 |
the queue, so that the business of the queue is exposed to the |
201 |
applications so that they behave appropriately (falling back to |
202 |
lower-bandwidth alternatives, etc). In my case if mythtv's write |
203 |
buffers are filling up and I'm also running an emerge install phase |
204 |
the correct answer (per ionice) is for emerge to block so that my |
205 |
realtime video capture buffers are safely flushed. What you don't |
206 |
want is for the kernel to let emerge dump a few GB of low-priority |
207 |
data into the write cache alongside my 5Mbps HD recording stream. |
208 |
Granted, it isn't as big a problem as it used to be now that RAM sizes |
209 |
have increased. |
210 |
|
211 |
> |
212 |
> Gentoo /is/ fire-and-forget in that it works fine. Btrfs is not in that |
213 |
> it may work or not. |
214 |
> |
215 |
|
216 |
Well, we certainly must have come a long way then. :) I still |
217 |
remember the last time the glibc ABI changed and I was basically |
218 |
rebuilding everything from single-user mode holding my breath. |
219 |
|
220 |
|
221 |
-- |
222 |
Rich |