Gentoo Archives: gentoo-user

From: Rich Freeman <rich0@g.o>
To: gentoo-user@l.g.o
Subject: Re: [gentoo-user] Getting maximum space out of a hard drive
Date: Fri, 26 Aug 2022 12:07:38
Message-Id: CAGfcS_=S2c6q4V8=WO4ZChQvT4CF8sY99PJ9QzdotxVBUtTyiQ@mail.gmail.com
In Reply to: Re: [gentoo-user] Getting maximum space out of a hard drive by Dale
1 On Fri, Aug 26, 2022 at 7:26 AM Dale <rdalek1967@×××××.com> wrote:
2 >
3 > I looked into the Raspberry and the newest version, about $150 now, doesn't even have SATA ports.
4
5 The Pi4 is definitely a step up from the previous versions in terms of
6 IO, but it is still pretty limited. It has USB3 and gigabit, and they
7 don't share a USB host or anything like that, so you should get close
8 to full performance out of both. The CPU is of course pretty limited,
9 as is RAM. Biggest benefit is the super-low power consumption, and
10 that is something I take seriously as for a lot of cheap hardware that
11 runs 24x7 the power cost rapidly exceeds the purchase price. I see
12 people buying old servers for $100 or whatever and those things will
13 often go through $100 worth of electricity in a few months.
14
15 How many hard drives are you talking about? There are two general
16 routes to go for something like this. The simplest and most
17 traditional way is a NAS box of some kind, with RAID. The issue with
18 these approaches is that you're limited by the number of hard drives
19 you can run off of one host, and of course if anything other than a
20 drive fails you're offline. The other approach is a distributed
21 filesystem. That ramps up the learning curve quite a bit, but for
22 something like media where IOPS doesn't matter it eliminates the need
23 to try to cram a dozen hard drives into one host. Ceph can also do
24 IOPS but you're talking 10GbE + NVMe and big bucks, and that is how
25 modern server farms would do it.
26
27 I'll describe the traditional route since I suspect that is where
28 you're going to end up. If you only had 2-4 drives total you could
29 probably get away with a Pi4 and USB3 drives, but if you want
30 encryption or anything CPU-intensive you're probably going to
31 bottleneck on the CPU. It would be fine if you're more concerned with
32 capacity than storage.
33
34 For more drives than that, or just to be more robust, then any
35 standard amd64 build will be fine. Obviously a motherboard with lots
36 of SATA ports will help here. However, that almost always is a
37 bottleneck on consumer gear, and the typical solution to that for SATA
38 is a host bus adapter. They're expensive new, but cheap on ebay (I've
39 had them fail though, which is probably why companies tend to sell
40 them while they're still working). They also use a ton of power -
41 I've measured them using upwards of 60W - they're designed for servers
42 where nobody seems to care. A typical HBA can provide 8-32 SATA
43 ports, via mini-SAS breakout cables (one mini-SAS port can provide 4
44 SATA ports). HBAs tend to use a lot of PCIe lanes - you don't
45 necessarily need all of them if you only have a few drives and they're
46 spinning disks, but it is probably easiest if you get a CPU with
47 integrated graphics and use the 16x slot for the HBA. That or get a
48 motherboard with two large slots (they usually aren't 16x, but getting
49 4-8x slots on a consumer motherboard isn't super-common).
50
51 For software I'd use mdadm plus LVM. ZFS or btrfs are your other
52 options, and those can run on bare metal, but btrfs is immature and
53 ZFS cannot be reshaped the way mdadm can, so there are tradeoffs. If
54 you want to use your existing drives and don't have a backup to
55 restore or want to do it live, then the easiest option there is to add
56 one drive to the system to expand capacity. Put mdadm on that drive
57 as a degraded raid1 or whatever, then put LVM on top, and migrate data
58 from an existing disk live over to the new one, freeing up one or more
59 existing drives. Then put mdadm on those and LVM and migrate more
60 data onto them, and so on, until everything is running on top of
61 mdadm. Of course you need to plan how you want the array to look and
62 have enough drives that you get the desired level of redundancy. You
63 can start with degraded arrays (which is no worse than what you have
64 now), then when enough drives are freed up they can be added as pairs
65 to fill it out.
66
67 If you want to go the distributed storage route then CephFS is the
68 canonical solution at this point but it is RAM-hungry so it tends to
69 be expensive. It is also complex, but there are ansible playbooks and
70 so on to manage that (though playbooks with 100+ plays in them make me
71 nervous). For something simpler MooseFS or LizardFS are probably
72 where I'd start. I'm running LizardFS but they've been on the edge of
73 death for years upstream and MooseFS licensing is apparently better
74 now, so I'd probably look at that first. I did a talk on lizardfs
75 recently: https://www.youtube.com/watch?v=dbMRcVrdsQs
76
77 --
78 Rich

Replies

Subject Author
Re: [gentoo-user] Getting maximum space out of a hard drive Dale <rdalek1967@×××××.com>