Gentoo Archives: gentoo-user

From: Rich Freeman <rich0@g.o>
To: gentoo-user@l.g.o
Subject: Re: [gentoo-user] [OT] SMR drives (WAS: cryptsetup close and device in use when it is not)
Date: Sun, 01 Aug 2021 11:37:57
Message-Id: CAGfcS_=A=+L9pBxHuaa+Fitom494vqwwQS2NU48kBhaDgEResQ@mail.gmail.com
In Reply to: Re: [gentoo-user] [OT] SMR drives (WAS: cryptsetup close and device in use when it is not) by William Kenworthy
1 On Sat, Jul 31, 2021 at 11:05 PM William Kenworthy <billk@×××××××××.au> wrote:
2 >
3 > On 31/7/21 9:30 pm, Rich Freeman wrote:
4 > >
5 > > I'd love server-grade ARM hardware but it is just so expensive unless
6 > > there is some source out there I'm not aware of. It is crazy that you
7 > > can't get more than 4-8GiB of RAM on an affordable arm system.
8 > Checkout the odroid range. Same or only slightly $$$ more for a much
9 > better unit than a pi (except for the availability of 8G ram on the pi4)
10
11 Oh, they have been on my short list.
12
13 I was opining about the lack of cheap hardware with >8GB of RAM, and I
14 don't believe ODROID offers anything like that. I'd be happy if they
15 just took DDR4 on top of whatever onboard RAM they had.
16
17 My SBCs for the lizardfs cluster are either Pi4s or RockPro64s. The
18 Pi4 addresses basically all the issues in the original Pis as far as
19 I'm aware, and is comparable to most of the ODroid stuff I believe (at
20 least for the stuff I need), and they're still really cheap. The
21 RockPro64 was a bit more expensive but also performs nicely - I bought
22 that to try playing around with LSI HBAs to get many SATA drives on
23 one SBC.
24
25 I'm mainly storing media so capacity matters more than speed. At the
26 time most existing SBCs either didn't have SATA or had something like
27 1-2 ports, and that means you're ending up with a lot of hosts. Sure,
28 it would perform better, but it costs more. Granted, at the start I
29 didn't want more than 1-2 drives per host anyway until I got up to
30 maybe 5 or so hosts just because that is where you see the cluster
31 perform well and have decent safety margins, but at this point if I
32 add capacity it will be to existing hosts.
33
34 > Tried ceph - run away fast :)
35
36 Yeah, it is complex, and most of the tools for managing it created
37 concerns that if something went wrong they could really mess the whole
38 thing up fast. The thing that pushed me away from it was reports that
39 it doesn't perform well only a few OSDs and I wanted something I could
40 pilot without buying a lot of hardware. Another issue is that at
41 least at the time I was looking into it they wanted OSDs to have 1GB
42 of RAM per 1TB of storage. That is a LOT of RAM. Aside from the fact
43 that RAM is expensive, it basically eliminates the ability to use
44 low-power cheap SBCs for all the OSDs, which is what I'm doing with
45 lizardfs. I don't care about the SBCs being on 24x7 when they pull a
46 few watts each peak, and almost nothing when idle. If I want to
47 attach even 4x14TB hard drives to an SBC though it would need 64GB of
48 RAM per the standards of Ceph at the time. Good luck finding a cheap
49 low-power ARM board that has 64GB of RAM - anything that even had DIMM
50 slots was something crazy like $1k at the time and at that point I
51 might as well build full PCs.
52
53 It seems like they've backed off on the memory requirements, maybe,
54 but I'd want to check on that. I've seen stories of bad things
55 happening when the OSDs don't have much RAM and you run into a
56 scenario like:
57 1. Lose disk, cluster starts to rebuild.
58 2. Lose another disk, cluster queues another rebuild.
59 3. Oh, first disk comes back, cluster queues another rebuild to
60 restore the first disk.
61 4. Replace the second failed disk, cluster queues another rebuild.
62
63 Apparently at least in the old days all the OSDs had to keep track of
64 all of that and they'd run out of RAM and basically melt down, unless
65 you went around adding more RAM to every OSD.
66
67 With LizardFS the OSDs basically do nothing at all but pipe stuff to
68 disk. If you want to use full-disk encryption then there is a CPU hit
69 for that, but that is all outside of Lizardfs and dm-crypt at least is
70 reasonable. (zfs on the other hand does not hardware accelerate it on
71 SBCs as far as I can tell and that hurts.)
72
73 > I improved performance and master memory
74 > requirements considerably by pushing the larger data sets (e.g., Gib of
75 > mail files) into a container file stored on MFS and loop mounted onto
76 > the mailserver lxc instance. Convoluted but very happy with the
77 > improvement its made.
78
79 Yeah, I've noticed as you described in the other email memory depends
80 on number of files, and it depends on having it all in RAM at once.
81 I'm using it for media storage mostly so the file count is modest. I
82 do use snapshots but only a few at a time so it can handle that.
83 While the master is running on amd64 with plenty of RAM I do have
84 shadow masters set up on SBCs and I do want to be able to switch over
85 to one if something goes wrong, so I want RAM use to be acceptable.
86 It really doesn't matter how much space the files take up - just now
87 many inodes you have.
88
89 --
90 Rich