1 |
On Sun, Mar 1, 2020 at 8:52 PM William Kenworthy <billk@×××××××××.au> wrote: |
2 |
> |
3 |
> For those wanting to run a lot of drives on a single host - that defeats |
4 |
> the main advantage of using a chunkserver based filesystem - |
5 |
> redundancy. Its far more common to have a host fail than a disk drive. |
6 |
> Losing the major part of your storage in one go means the cluster is |
7 |
> effectively dead - hence having a lot of completely separate systems is |
8 |
> much more reliable |
9 |
|
10 |
Of course. You should have multiple hosts before you start putting |
11 |
multiple drives on a single host. |
12 |
|
13 |
However, once you have a few hosts the performance improves by adding |
14 |
more, but you're not really getting THAT much additional redundancy. |
15 |
You would get faster rebuild times by having more hosts since there |
16 |
would be less data to transfer when one fails and more hosts doing the |
17 |
work. |
18 |
|
19 |
So, it is about finding a balance. You probably don't want 30 drives |
20 |
on 2 hosts. However, you probably also don't need 15-30 hosts for |
21 |
that many drives either. I wouldn't be putting 16 drives onto a |
22 |
single host until I had a fair number of hosts. |
23 |
|
24 |
As far as the status of lizardfs goes - as far as I can tell it is |
25 |
mostly developed by a company and they've wavered a bit on support in |
26 |
the last year. I share your observation that they seem to be picking |
27 |
up again. In any case, I'm running the latest stable and it works |
28 |
just fine, but it lacks the high availability features. I can have |
29 |
shadow masters, but they won't automatically fail over, so maintenance |
30 |
on the master is still a pain. Recovery due to failure of the master |
31 |
should be pretty quick though even if manual - just have to run a |
32 |
command on each shadow to determine which has the most recent |
33 |
metadata, then adjust DNS for my master CNAME to point to the new |
34 |
master, and then edit config on the new master to tell it that it is |
35 |
the master and no longer a shadow, and after restarting the daemon the |
36 |
cluster should be online again. |
37 |
|
38 |
The latest release candidate has the high availability features (used |
39 |
to be paid, is now free), however it is still a release candidate and |
40 |
I'm not in that much of a rush. There was a lot of griping on the |
41 |
forums/etc by users who switched to the release candidate and ran into |
42 |
bugs that ate their data. IMO that is why you don't go running |
43 |
release candidates for distributed filesystems with a dozen hard |
44 |
drives on them - if you want to try them out just run them in VMs with |
45 |
a few GB of storage to play with and who cares if your test data is |
46 |
destroyed. It is usually wise to be conservative with your |
47 |
filesystems. Makes no difference to me if they take another year to |
48 |
do the next release - I'd like the HA features but it isn't like the |
49 |
old code goes stale. |
50 |
|
51 |
Actually, the one thing that it would be nice if they fixed is the |
52 |
FUSE client - it seems to leak RAM. |
53 |
|
54 |
Oh, and the docs seem to hint at a windows client somewhere which |
55 |
would be really nice to have, but I can't find any trace of it. I |
56 |
only normally run a single client but it would obviously perform well |
57 |
as a general-purpose fileserver. |
58 |
|
59 |
There has been talk of a substantial rewrite, though I'm not sure if |
60 |
that will actually happen now. If it does I hope they do keep the RAM |
61 |
requirements low on the chunkservers. That was the main thing that |
62 |
turned me off from ceph - it is a great platform in general but |
63 |
needing 1GB RAM per 1TB disk adds up really fast, and it basically |
64 |
precludes ARM SBCs as OSDs as you can't get those with that much RAM |
65 |
for any sane price - even if you were only running one drive per host |
66 |
good luck finding a SBC with 13GB+ of RAM. You can tune ceph to use |
67 |
less RAM but I've heard that bad things happen if you have some hosts |
68 |
shuffle during a rebuild and you don't have gobs of RAM - all the OSDs |
69 |
end up with an impossible backlog and they keep crashing until you run |
70 |
around like Santa Claus filling every stocking with a handful of $60 |
71 |
DIMMs. |
72 |
|
73 |
Right now lizardfs basically uses almost no ram at all on |
74 |
chunkservers, so an ARM SBC could run dozens of drives without an |
75 |
issue. |
76 |
|
77 |
-- |
78 |
Rich |