Gentoo Archives: gentoo-user

From: Dale <rdalek1967@×××××.com>
To: gentoo-user@l.g.o
Subject: Re: [gentoo-user] Hard drive storage questions
Date: Sun, 11 Nov 2018 00:45:49
Message-Id: cb577279-9732-8d3b-1df1-e09d890ebe01@gmail.com
In Reply to: Re: [gentoo-user] Hard drive storage questions by Rich Freeman
1 Rich Freeman wrote:
2 > On Thu, Nov 8, 2018 at 8:16 PM Dale <rdalek1967@×××××.com> wrote:
3 >> I'm trying to come up with a
4 >> plan that allows me to grow easier and without having to worry about
5 >> running out of motherboard based ports.
6 >>
7 > So, this is an issue I've been changing my mind on over the years.
8 > There are a few common approaches:
9 >
10 > * Find ways to cram a lot of drives on one host
11 > * Use a patchwork of NAS devices or improvised hosts sharing over
12 > samba/nfs/etc and end up with a mess of mount points.
13 > * Use a distributed FS
14 >
15 > Right now I'm mainly using the first approach, and I'm trying to move
16 > to the last. The middle option has never appealed to me.
17
18 And this is what I'm trying to avoid.  Doing one thing, realizing I
19 should have done it different and then having to spend even more money
20 to do it the right way.  I'm trying to get advice on what is the best
21 way forward that I can afford.  Obviously I don't need a setup like
22 facebook, google or something but I don't want to spend a few hundred
23 dollars doing something only to realize, it needs to be sold to the next
24 idiot on Ebay.   ROFL  You're giving me some good options to think on
25 here.  ;-) 
26
27 >
28 > So, to do more of what you're doing in the most efficient way
29 > possible, I recommend finding used LSI HBA cards. These have mini-SAS
30 > ports on them, and one of these can be attached to a breakout cable
31 > that gets you 4 SATA ports. I just picked up two of these for $20
32 > each on ebay (used) and they have 4 mini-SAS ports each, which is
33 > capacity for 16 SATA drives per card. Typically these have 4x or
34 > larger PCIe interfaces, so you'll need a large slot, or one with a
35 > cutout. You'd have to do the math but I suspect that if the card+MB
36 > supports PCIe 3.0 you're not losing much if you cram it into a smaller
37 > slot. If most of the drives are idle most of the time then that also
38 > demands less bandwidth. 16 fully busy hard drives obviously can put
39 > out a lot of data if reading sequentially.
40 >
41 > You can of course get more consumer-oriented SATA cards, but you're
42 > lucky to get 2-4 SATA ports on a card that runs you $30. The mini-SAS
43 > HBAs get you a LOT more drives per PCIe slot, and your PCIe slots are
44 > you main limiting factor assuming you have power and case space.
45 >
46 > Oh, and those HBA cards need to be flashed into "IT" mode - they're
47 > often sold this way, but if they support RAID you want to flash the IT
48 > firmware that just makes them into a bunch of standalone SATA slots.
49 > This is usually a PITA that involves DOS or whatever, but I have
50 > noticed some of the software needed in the Gentoo repo.
51 >
52 > If you go that route it is just like having a ton of SATA ports in
53 > your system - they just show up as sda...sdz and so on (no idea where
54 > it goes after that). Software-wise you just keep doing what you're
55 > already doing (though you should be seriously considering
56 > mdadm/zfs/btrfs/whatever at that point).
57 >
58 > That is the more traditional route.
59 >
60 > Now let me talk about distributed filesystems, which is the more
61 > scalable approach. I'm getting tired of being limited by SATA ports,
62 > and cases, and such. I'm also frustrated with some of zfs's
63 > inflexibility around removing drives. These are constraints that make
64 > upgrading painful, and often inefficient. Distributed filesystems
65 > offer a different solution.
66 >
67 > A distributed filesystem spreads its storage across many hosts, with
68 > an arbitrary number of drives per host (more or less). So, you can
69 > add more hosts, add more drives to a host, and so on. That means
70 > you're never forced to try to find a way to cram a few more drives in
71 > one host. The resulting filesystem appears as one gigantic filesystem
72 > (unless you want to split it up), which means no mess of nfs
73 > mountpoints and so on, and all the other headaches of nfs. Just as
74 > with RAID these support redundancy, except now you can lose entire
75 > hosts without issue. With many you can even tell it which
76 > PDU/rack/whatever each host is plugged into, and it will make sure you
77 > can lose all the hosts in one rack. You can also mount the filesystem
78 > on as many hosts as you want at the same time.
79 >
80 > They do tend to be a bit more complex. The big players can scale VERY
81 > large - thousands of drives easily. Everything seems to be moving
82 > towards Ceph/CephFS. If you were hosting a datacenter full of
83 > VMs/containers/etc I'd be telling you to host it on Ceph. However,
84 > for small scale (which you definitely are right now), I'm not thrilled
85 > with it. Due to the way it allocates data (hash-based) anytime
86 > anything changes you end up having to move all the data around in the
87 > cluster, and all the reports I've read suggests it doesn't perform all
88 > that great if you only have a few nodes. Ceph storage nodes are also
89 > RAM-hungry, and I want to run these on ARM to save power, and few ARM
90 > boards have that kind of RAM, and they're very expensive.
91 >
92 > Personally I'm working on deploying a cluster of a few nodes running
93 > LizardFS, which is basically a fork/derivative of MooseFS. While it
94 > won't scale nearly as well, below 100 nodes should be fine, and in
95 > particular it sounds like it works fairly well with only a few nodes.
96 > It has its pros and cons, but for my needs it should be sufficient.
97 > It also isn't RAM-hungry. I'm going to be testing it on some
98 > RockPro64s, with the LSI HBAs.
99 >
100 > I did note that Gentoo lacks a LizardFS client. I suspect I'll be
101 > looking to fix that - I'm sure the moosefs ebuild would be a good
102 > starting point. I'm probably going to be a whimp and run the storage
103 > nodes on Ubuntu or whatever upstream targets - they're basically
104 > appliances as far as I'm concerned.
105 >
106 > So, those are the two routes I'd recommend. Just get yourself an HBA
107 > if you only want a few more drives. If you see your needs expanding
108 > then consider a distributed filesystem. The advantage of the latter
109 > is that you can keep expanding it however you want with additional
110 > drives/nodes/whatever. If you're going over 20 nodes I'd use Ceph for
111 > sure - IMO that seems to be the future of this space.
112 >
113
114 This is a lot to think on.  Money wise, and maybe even expansion wise, I
115 may go with the PCI SATA cards and add drives inside my case.  I have
116 plenty of power supply since it pulls at most 200 watts and I think my
117 P/S is like 700 or 800 watts.  I can also add a external SATA card or
118 another USB drive to do backups with as well.  At some point tho, I may
119 have to build one of those little tiny systems that is basically nothing
120 but SATA drive controllers and ethernet enabled.  Have that sitting in a
121 closet somewhere running some small OS.  I can always just move the
122 drives from my system to it if needed.
123
124 One thing is for sure, you gave a lot of info and different ways to
125 think on this. 
126
127 Thanks to Rich and everyone else to for their thoughts.  It's certainly
128 helped give me ideas I haven't thought of. 
129
130 Dale
131
132 :-)  :-)
133
134 P. S.  For those who may wonder, my Mom is home and doing pretty well. 
135 :-D 

Replies

Subject Author
Re: [gentoo-user] Hard drive storage questions Wol's lists <antlists@××××××××××××.uk>