1 |
On Tue, Jan 20, 2015 at 10:07 AM, James <wireless@×××××××××××.com> wrote: |
2 |
> Bill Kenworthy <billk <at> iinet.net.au> writes: |
3 |
> |
4 |
>> You can turn off COW and go single on btrfs to speed it up but bugs in |
5 |
>> ceph and btrfs lose data real fast! |
6 |
> |
7 |
> Interesting idea, since I'll have raid1 underneath each node. I'll need to |
8 |
> dig into this idea a bit more. |
9 |
> |
10 |
|
11 |
So, btrfs and ceph solve an overlapping set of problems in an |
12 |
overlapping set of ways. In general adding data security often comes |
13 |
at the cost of performance, and obviously adding it at multiple layers |
14 |
can come at the cost of additional performance. I think the right |
15 |
solution is going to depend on the circumstances. |
16 |
|
17 |
if ceph provided that protection against bitrot I'd probably avoid a |
18 |
COW filesystem entirely. It isn't going to add any additional value, |
19 |
and they do have a performance cost. If I had mirroring at the ceph |
20 |
level I'd probably just run them on ext4 on lvm with no |
21 |
mdadm/btrfs/whatever below that. Availability is already ensured by |
22 |
ceph - if you lose a drive then other nodes will pick up the load. If |
23 |
I didn't have robust mirroring at the ceph level then having mirroring |
24 |
of some kind at the individual node level would improve availability. |
25 |
|
26 |
On the other hand, ceph currently has some gaps, so having it on top |
27 |
of zfs/btrfs could provide protection against bitrot. However, right |
28 |
now there is no way to turn off COW while leaving checksumming |
29 |
enabled. It would be nice if you could leave the checksumming on. |
30 |
Then if there was bitrot btrfs would just return an error when you |
31 |
tried to read the file, and then ceph would handle it like any other |
32 |
disk error and use a mirrored copy on another node. The problem with |
33 |
ceph+ext4 is that if there is bitrot neither layer will detect it. |
34 |
|
35 |
Does btrfs+ceph really have a performance hit that is larger than |
36 |
btrfs without ceph? I fully expect it to be slower than ext4+ceph. |
37 |
Btrfs in general performs fairly poorly right now - that is expected |
38 |
to improve in the future, but I doubt that it will ever outperform |
39 |
ext4 other than for specific operations that benefit from it (like |
40 |
reflink copies). It will always be faster to just overwrite one block |
41 |
in the middle of a file than to write the block out to unallocated |
42 |
space and update all the metadata. |
43 |
|
44 |
-- |
45 |
Rich |