Gentoo Archives: gentoo-user

From:	William Kenworthy <billk@×××××××××.au>
To:	gentoo-user@l.g.o
Subject:	Re: [gentoo-user] Rasp-Pi-4 Gentoo servers
Date:	Mon, 02 Mar 2020 05:22:24
Message-Id:	`5eedd648-9718-def4-7612-1250f8a40bb5@iinet.net.au`
In Reply to:	Re: [gentoo-user] Rasp-Pi-4 Gentoo servers by Rich Freeman

1	On 2/3/20 10:40 am, Rich Freeman wrote:
2	> On Sun, Mar 1, 2020 at 8:52 PM William Kenworthy <billk@×××××××××.au> wrote:
3	>> For those wanting to run a lot of drives on a single host - that defeats
4	>> the main advantage of using a chunkserver based filesystem -
5	>> redundancy. Its far more common to have a host fail than a disk drive.
6	>> Losing the major part of your storage in one go means the cluster is
7	>> effectively dead - hence having a lot of completely separate systems is
8	>> much more reliable
9	> Of course. You should have multiple hosts before you start putting
10	> multiple drives on a single host.
11	>
12	> However, once you have a few hosts the performance improves by adding
13	> more, but you're not really getting THAT much additional redundancy.
14	> You would get faster rebuild times by having more hosts since there
15	> would be less data to transfer when one fails and more hosts doing the
16	> work.
17	>
18	> So, it is about finding a balance. You probably don't want 30 drives
19	> on 2 hosts. However, you probably also don't need 15-30 hosts for
20	> that many drives either. I wouldn't be putting 16 drives onto a
21	> single host until I had a fair number of hosts.
22	>
23	> As far as the status of lizardfs goes - as far as I can tell it is
24	> mostly developed by a company and they've wavered a bit on support in
25	> the last year. I share your observation that they seem to be picking
26	> up again. In any case, I'm running the latest stable and it works
27	> just fine, but it lacks the high availability features. I can have
28	> shadow masters, but they won't automatically fail over, so maintenance
29	> on the master is still a pain. Recovery due to failure of the master
30	> should be pretty quick though even if manual - just have to run a
31	> command on each shadow to determine which has the most recent
32	> metadata, then adjust DNS for my master CNAME to point to the new
33	> master, and then edit config on the new master to tell it that it is
34	> the master and no longer a shadow, and after restarting the daemon the
35	> cluster should be online again.
36	>
37	> The latest release candidate has the high availability features (used
38	> to be paid, is now free), however it is still a release candidate and
39	> I'm not in that much of a rush. There was a lot of griping on the
40	> forums/etc by users who switched to the release candidate and ran into
41	> bugs that ate their data. IMO that is why you don't go running
42	> release candidates for distributed filesystems with a dozen hard
43	> drives on them - if you want to try them out just run them in VMs with
44	> a few GB of storage to play with and who cares if your test data is
45	> destroyed. It is usually wise to be conservative with your
46	> filesystems. Makes no difference to me if they take another year to
47	> do the next release - I'd like the HA features but it isn't like the
48	> old code goes stale.
49	>
50	> Actually, the one thing that it would be nice if they fixed is the
51	> FUSE client - it seems to leak RAM.
52	>
53	> Oh, and the docs seem to hint at a windows client somewhere which
54	> would be really nice to have, but I can't find any trace of it. I
55	> only normally run a single client but it would obviously perform well
56	> as a general-purpose fileserver.
57	>
58	> There has been talk of a substantial rewrite, though I'm not sure if
59	> that will actually happen now. If it does I hope they do keep the RAM
60	> requirements low on the chunkservers. That was the main thing that
61	> turned me off from ceph - it is a great platform in general but
62	> needing 1GB RAM per 1TB disk adds up really fast, and it basically
63	> precludes ARM SBCs as OSDs as you can't get those with that much RAM
64	> for any sane price - even if you were only running one drive per host
65	> good luck finding a SBC with 13GB+ of RAM. You can tune ceph to use
66	> less RAM but I've heard that bad things happen if you have some hosts
67	> shuffle during a rebuild and you don't have gobs of RAM - all the OSDs
68	> end up with an impossible backlog and they keep crashing until you run
69	> around like Santa Claus filling every stocking with a handful of $60
70	> DIMMs.
71	>
72	> Right now lizardfs basically uses almost no ram at all on
73	> chunkservers, so an ARM SBC could run dozens of drives without an
74	> issue.
75	>
76	Everything bad you hear about ceph is true ... and then some! I did try,
77	but this was some years ago so hopefully its better now. The two biggies
78	were excessive network requirements (bandwidth, separation) and recovery
79	times with frequent crash and burn. There are ceph features I would
80	really like to use (rbd, local copies with much simpler config, ...) but
81	moosefs is a lot more bullet proof on lesser resource requirements
82	though I did find properly pruned vlans on a smartswitch separating the
83	intra-cluster from external requests made a noticeable difference.
84
85	moosefs has a windows client but its only available with the paid
86	version. The master/shadow-master and auto failover is only available
87	through the paid version - for the community you have to stop the
88	master, copy the files then change DNS etc. before restarting the new
89	master - cant really do it online even when scripted - its painful with
90	downtime and I had dns caching issues that took time to work their way
91	out of the system. I thought lizardfs was much more community minded
92	but you are characterising it as similar to moosefs - a taster offering
93	by a commercial company holding back some of the non-essential but
94	jucier features for the paid version - is that how you see them?
95
96	By the way, to keep to the rpi subject, I did have a rpi3B with a usb2
97	sata drive attached but it was hopeless as a chunkserver impacting the
98	whole cluster. Having the usb data flow and network data flow through
99	the same hub just didn't go well - I started with the odroids before the
100	rpi4 was released or I might have experimented with it first (using a
101	sata HAT) - anyone with a comment on how that compares with a HC2?
102
103	BillK

Replies

Subject	Author
Re: [gentoo-user] Rasp-Pi-4 Gentoo servers	Rich Freeman <rich0@g.o>

Report Message

Find on MARC Find on Google Groups