1 |
Bill Kenworthy <billk <at> iinet.net.au> writes: |
2 |
|
3 |
|
4 |
> > The main thing keeping me away from CephFS is that it has no mechanism |
5 |
> > for resolving silent corruption. Btrfs underneath it would obviously |
6 |
> > help, though not for failure modes that involve CephFS itself. I'd |
7 |
> > feel a lot better if CephFS had some way of determining which copy was |
8 |
> > the right one other than "the master server always wins." |
9 |
|
10 |
|
11 |
The "Giant" version 0.87 is a major release with many new fixes; |
12 |
it may have the features you need. Currently the ongoing releases are |
13 |
up to : v0.91. The readings look promissing, but I'll agree it |
14 |
needs to be tested with non-critical data. |
15 |
|
16 |
http://ceph.com/docs/master/release-notes/#v0-87-giant |
17 |
|
18 |
http://ceph.com/docs/master/release-notes/#notable-changes |
19 |
|
20 |
|
21 |
> Forget ceph on btrfs for the moment - the COW kills it stone dead after |
22 |
> real use. When running a small handful of VMs on a raid1 with ceph - |
23 |
> sloooooooooooow :) |
24 |
|
25 |
I'm staying away from VMs. It's spark on top of mesos I'm after. Maybe |
26 |
docker or another container solution, down the road. |
27 |
|
28 |
I read where some are using a SSD with raid 1 and bcache to speed up |
29 |
performance and stability a bit. I do not want to add SSD to the mix right |
30 |
now, as the (3) node development systems all have 32 G of ram. |
31 |
|
32 |
|
33 |
|
34 |
> You can turn off COW and go single on btrfs to speed it up but bugs in |
35 |
> ceph and btrfs lose data real fast! |
36 |
|
37 |
Interesting idea, since I'll have raid1 underneath each node. I'll need to |
38 |
dig into this idea a bit more. |
39 |
|
40 |
|
41 |
> ceph itself (my last setup trashed itself 6 months ago and I've given |
42 |
> up!) will only work under real use/heavy loads with lots of discrete |
43 |
> systems, ideally 10G network, and small disks to spread the failure |
44 |
> domain. Using 3 hosts and 2x2g disks per host wasn't near big enough :( |
45 |
> Its design means that small scale trials just wont work. |
46 |
|
47 |
Huh. My systems are FX8350 (8)processors running at 4GHz with 32 G ram. |
48 |
Water coolers will allow me to crank up the speed (when/if needed) to |
49 |
5 or 6 GHz. Not intel but low end either. |
50 |
|
51 |
|
52 |
> Its not designed for small scale/low end hardware, no matter how |
53 |
> attractive the idea is :( |
54 |
|
55 |
Supposedly there are tool to measure/monitor ceph better now. That is |
56 |
one of the things I need to research. How to manage the small cluster |
57 |
better and back off the throughput/load while monitoring performance |
58 |
on a variety of different tasks. Definitely not a production usage. |
59 |
|
60 |
I certainly appreciate your ceph_experiences. I filed a but with the |
61 |
version request for Giant v0.87. Did your run the 9999 version ? |
62 |
What versions did you experiment with? |
63 |
|
64 |
I hope to set up Anisble to facilitate rapid installations of a variety |
65 |
of gentoo systems used for cluster or ceph testing. That way configurations |
66 |
should be able to "reboot" after bad failures. Did your experienced |
67 |
failures with Ceph require the gentoo-btrfs based systems to be complete |
68 |
reinstalled from scratch, or just purge the disk of Ceph and reconfigure Ceph? |
69 |
|
70 |
I'm hoping to "configure ceph" in such a way that failures do not corrupt |
71 |
the gentoo-btrfs installation and only require repair to ceph; so your |
72 |
comments on that strategy are most welcome. |
73 |
|
74 |
|
75 |
|
76 |
|
77 |
> BillK |
78 |
|
79 |
|
80 |
James |
81 |
|
82 |
|
83 |
> |