1 |
Rich Freeman <rich0 <at> gentoo.org> writes: |
2 |
|
3 |
|
4 |
> >> You can turn off COW and go single on btrfs to speed it up but bugs in |
5 |
> >> ceph and btrfs lose data real fast! |
6 |
|
7 |
> So, btrfs and ceph solve an overlapping set of problems in an |
8 |
> overlapping set of ways. In general adding data security often comes |
9 |
> at the cost of performance, and obviously adding it at multiple layers |
10 |
> can come at the cost of additional performance. I think the right |
11 |
> solution is going to depend on the circumstances. |
12 |
|
13 |
Raid 1 with btrfs can not only protect the ceph fs files but the gentoo |
14 |
node installation itself. I'm not so worried about proformance, because |
15 |
my main (end result) goal is to throttle codes so they run almost |
16 |
exclusively in ram (in memory) as design by amplabs. Spark plus Tachyon is a |
17 |
work in progress, for sure. The DFS will be used in lieu of HDFS for |
18 |
distributed/cluster types of apps, hence ceph. Btrfs + raid 1 is as |
19 |
a failsafe for the node installations, but also all data. I only intend |
20 |
to write out data, once a job/run is finished; but granted that is very |
21 |
experimental right now and will evolve over time. |
22 |
|
23 |
|
24 |
> |
25 |
> if ceph provided that protection against bitrot I'd probably avoid a |
26 |
> COW filesystem entirely. It isn't going to add any additional value, |
27 |
> and they do have a performance cost. If I had mirroring at the ceph |
28 |
> level I'd probably just run them on ext4 on lvm with no |
29 |
> mdadm/btrfs/whatever below that. Availability is already ensured by |
30 |
> ceph - if you lose a drive then other nodes will pick up the load. If |
31 |
> I didn't have robust mirroring at the ceph level then having mirroring |
32 |
> of some kind at the individual node level would improve availability. |
33 |
|
34 |
I've read that btrfs and ceph are a very, suitable, yet very immature |
35 |
match for local-distributed file system needs. |
36 |
|
37 |
|
38 |
> On the other hand, ceph currently has some gaps, so having it on top |
39 |
> of zfs/btrfs could provide protection against bitrot. However, right |
40 |
> now there is no way to turn off COW while leaving checksumming |
41 |
> enabled. It would be nice if you could leave the checksumming on. |
42 |
> Then if there was bitrot btrfs would just return an error when you |
43 |
> tried to read the file, and then ceph would handle it like any other |
44 |
> disk error and use a mirrored copy on another node. The problem with |
45 |
> ceph+ext4 is that if there is bitrot neither layer will detect it. |
46 |
|
47 |
Good points, hence a flexible configuration where ceph can be reconfigured |
48 |
and recovered as warranted, for this long term set of experiments. |
49 |
|
50 |
> Does btrfs+ceph really have a performance hit that is larger than |
51 |
> btrfs without ceph? I fully expect it to be slower than ext4+ceph. |
52 |
> Btrfs in general performs fairly poorly right now - that is expected |
53 |
> to improve in the future, but I doubt that it will ever outperform |
54 |
> ext4 other than for specific operations that benefit from it (like |
55 |
> reflink copies). It will always be faster to just overwrite one block |
56 |
> in the middle of a file than to write the block out to unallocated |
57 |
> space and update all the metadata. |
58 |
|
59 |
I fully expect the combination of btrfs+ceph to mature and become |
60 |
competitive. It's not critical data, but a long term experiment. Surely |
61 |
critical data will be backed up off the 3-node cluster. I hope to use |
62 |
ansible to enable recovery, configuration changes and bringing on and |
63 |
managing additional nodes; this a concept at the moment, but googling around |
64 |
it does seem to be a popular idea. |
65 |
|
66 |
As always your insight and advice is warmly received. |
67 |
|
68 |
|
69 |
James |
70 |
|
71 |
|
72 |
> |