1 |
On Thu, Nov 8, 2018 at 8:16 PM Dale <rdalek1967@×××××.com> wrote: |
2 |
> |
3 |
> I'm trying to come up with a |
4 |
> plan that allows me to grow easier and without having to worry about |
5 |
> running out of motherboard based ports. |
6 |
> |
7 |
|
8 |
So, this is an issue I've been changing my mind on over the years. |
9 |
There are a few common approaches: |
10 |
|
11 |
* Find ways to cram a lot of drives on one host |
12 |
* Use a patchwork of NAS devices or improvised hosts sharing over |
13 |
samba/nfs/etc and end up with a mess of mount points. |
14 |
* Use a distributed FS |
15 |
|
16 |
Right now I'm mainly using the first approach, and I'm trying to move |
17 |
to the last. The middle option has never appealed to me. |
18 |
|
19 |
So, to do more of what you're doing in the most efficient way |
20 |
possible, I recommend finding used LSI HBA cards. These have mini-SAS |
21 |
ports on them, and one of these can be attached to a breakout cable |
22 |
that gets you 4 SATA ports. I just picked up two of these for $20 |
23 |
each on ebay (used) and they have 4 mini-SAS ports each, which is |
24 |
capacity for 16 SATA drives per card. Typically these have 4x or |
25 |
larger PCIe interfaces, so you'll need a large slot, or one with a |
26 |
cutout. You'd have to do the math but I suspect that if the card+MB |
27 |
supports PCIe 3.0 you're not losing much if you cram it into a smaller |
28 |
slot. If most of the drives are idle most of the time then that also |
29 |
demands less bandwidth. 16 fully busy hard drives obviously can put |
30 |
out a lot of data if reading sequentially. |
31 |
|
32 |
You can of course get more consumer-oriented SATA cards, but you're |
33 |
lucky to get 2-4 SATA ports on a card that runs you $30. The mini-SAS |
34 |
HBAs get you a LOT more drives per PCIe slot, and your PCIe slots are |
35 |
you main limiting factor assuming you have power and case space. |
36 |
|
37 |
Oh, and those HBA cards need to be flashed into "IT" mode - they're |
38 |
often sold this way, but if they support RAID you want to flash the IT |
39 |
firmware that just makes them into a bunch of standalone SATA slots. |
40 |
This is usually a PITA that involves DOS or whatever, but I have |
41 |
noticed some of the software needed in the Gentoo repo. |
42 |
|
43 |
If you go that route it is just like having a ton of SATA ports in |
44 |
your system - they just show up as sda...sdz and so on (no idea where |
45 |
it goes after that). Software-wise you just keep doing what you're |
46 |
already doing (though you should be seriously considering |
47 |
mdadm/zfs/btrfs/whatever at that point). |
48 |
|
49 |
That is the more traditional route. |
50 |
|
51 |
Now let me talk about distributed filesystems, which is the more |
52 |
scalable approach. I'm getting tired of being limited by SATA ports, |
53 |
and cases, and such. I'm also frustrated with some of zfs's |
54 |
inflexibility around removing drives. These are constraints that make |
55 |
upgrading painful, and often inefficient. Distributed filesystems |
56 |
offer a different solution. |
57 |
|
58 |
A distributed filesystem spreads its storage across many hosts, with |
59 |
an arbitrary number of drives per host (more or less). So, you can |
60 |
add more hosts, add more drives to a host, and so on. That means |
61 |
you're never forced to try to find a way to cram a few more drives in |
62 |
one host. The resulting filesystem appears as one gigantic filesystem |
63 |
(unless you want to split it up), which means no mess of nfs |
64 |
mountpoints and so on, and all the other headaches of nfs. Just as |
65 |
with RAID these support redundancy, except now you can lose entire |
66 |
hosts without issue. With many you can even tell it which |
67 |
PDU/rack/whatever each host is plugged into, and it will make sure you |
68 |
can lose all the hosts in one rack. You can also mount the filesystem |
69 |
on as many hosts as you want at the same time. |
70 |
|
71 |
They do tend to be a bit more complex. The big players can scale VERY |
72 |
large - thousands of drives easily. Everything seems to be moving |
73 |
towards Ceph/CephFS. If you were hosting a datacenter full of |
74 |
VMs/containers/etc I'd be telling you to host it on Ceph. However, |
75 |
for small scale (which you definitely are right now), I'm not thrilled |
76 |
with it. Due to the way it allocates data (hash-based) anytime |
77 |
anything changes you end up having to move all the data around in the |
78 |
cluster, and all the reports I've read suggests it doesn't perform all |
79 |
that great if you only have a few nodes. Ceph storage nodes are also |
80 |
RAM-hungry, and I want to run these on ARM to save power, and few ARM |
81 |
boards have that kind of RAM, and they're very expensive. |
82 |
|
83 |
Personally I'm working on deploying a cluster of a few nodes running |
84 |
LizardFS, which is basically a fork/derivative of MooseFS. While it |
85 |
won't scale nearly as well, below 100 nodes should be fine, and in |
86 |
particular it sounds like it works fairly well with only a few nodes. |
87 |
It has its pros and cons, but for my needs it should be sufficient. |
88 |
It also isn't RAM-hungry. I'm going to be testing it on some |
89 |
RockPro64s, with the LSI HBAs. |
90 |
|
91 |
I did note that Gentoo lacks a LizardFS client. I suspect I'll be |
92 |
looking to fix that - I'm sure the moosefs ebuild would be a good |
93 |
starting point. I'm probably going to be a whimp and run the storage |
94 |
nodes on Ubuntu or whatever upstream targets - they're basically |
95 |
appliances as far as I'm concerned. |
96 |
|
97 |
So, those are the two routes I'd recommend. Just get yourself an HBA |
98 |
if you only want a few more drives. If you see your needs expanding |
99 |
then consider a distributed filesystem. The advantage of the latter |
100 |
is that you can keep expanding it however you want with additional |
101 |
drives/nodes/whatever. If you're going over 20 nodes I'd use Ceph for |
102 |
sure - IMO that seems to be the future of this space. |
103 |
|
104 |
-- |
105 |
Rich |