1 |
On Sat, Jul 31, 2021 at 8:59 AM William Kenworthy <billk@×××××××××.au> wrote: |
2 |
> |
3 |
> I tried using moosefs with a rpi3B in the |
4 |
> mix and it didn't go well once I started adding data - rpi 4's were not |
5 |
> available when I set it up. |
6 |
|
7 |
Pi2/3s only have USB2 as far as I'm aware, and they stick the ethernet |
8 |
port on that USB bus besides. So, they're terrible for anything that |
9 |
involves IO of any kind. |
10 |
|
11 |
The Pi4 moves the ethernet off of USB, upgrades it to gigabit, and has |
12 |
two USB3 hosts, so this is just all-around a missive improvement. |
13 |
Obviously it isn't going to outclass some server-grade system with a |
14 |
gazillion PCIe v4 lanes but it is very good for an SBC and the price. |
15 |
|
16 |
I'd love server-grade ARM hardware but it is just so expensive unless |
17 |
there is some source out there I'm not aware of. It is crazy that you |
18 |
can't get more than 4-8GiB of RAM on an affordable arm system. |
19 |
|
20 |
> I think that SMR disks will work quite well |
21 |
> on moosefs or lizardfs - I don't see long continuous writes to one disk |
22 |
> but a random distribution of writes across the cluster with gaps between |
23 |
> on each disk (1G network). |
24 |
|
25 |
So, the distributed filesystems divide all IO (including writes) |
26 |
across all the drives in the cluster. When you have a number of |
27 |
drives that obviously increases the total amount of IO you can handle |
28 |
before the SMR drives start hitting the wall. Writing 25GB of data to |
29 |
a single SMR drive will probably overrun its CMR cache, but if you |
30 |
split it across 10 drives and write 2.5GB each, there is a decent |
31 |
chance they'll all have room in the cache, take the write quickly, and |
32 |
then as long as your writes aren't sustained they can clear the |
33 |
buffer. |
34 |
|
35 |
I think you're still going to have an issue in a rebalancing scenario |
36 |
unless you're adding many drives at once so that the network becomes |
37 |
rate-limiting instead of the disks. Having unreplicated data sitting |
38 |
around for days or weeks due to slow replication performance is |
39 |
setting yourself up for multiple failures. So, I'd still stay away |
40 |
from them. |
41 |
|
42 |
If you have 10GbE then your ability to overrun those disks goes way |
43 |
up. Ditto if you're running something like Ceph which can achieve |
44 |
higher performance. I'm just doing bulk storage where I care a lot |
45 |
more about capacity than performance. If I were trying to run a k8s |
46 |
cluster or something I'd be on Ceph on SSD or whatever. |
47 |
|
48 |
> With a good adaptor, USB3 is great ... otherwise its been quite |
49 |
> frustrating :( I do suspect linux and its pedantic correctness trying |
50 |
> to deal with hardware that isn't truly standardised (as in the |
51 |
> manufacturer probably supplies a windows driver that covers it up) is |
52 |
> part of the problem. These adaptors are quite common and I needed to |
53 |
> apply the ATA command filter and turn off UAS using the usb tweaks |
54 |
> mechanism to stop the crashes and data corruption. The comments in the |
55 |
> kernel driver code for these adaptors are illuminating! |
56 |
|
57 |
Sometimes I wonder. I occasionally get errors in dmesg about |
58 |
unaligned writes when using zfs. Others have seen these. The zfs |
59 |
developers seem convinced that the issue isn't with zfs but it simply |
60 |
is reporting the issue, or maybe it happens under loads that you're |
61 |
more likely to get with zfs scrubbing (which IMO performs far worse |
62 |
than with btrfs - I'm guessing it isn't optimized to scan physically |
63 |
sequentially on each disk but may be doing it in a more logical order |
64 |
and synchronously between mirror pairs). Sometimes I wonder if there |
65 |
is just some sort of bug in the HBA drivers, or maybe the hardware on |
66 |
the motherboard. Consumer PC hardware (like all PC hardware) is |
67 |
basically a black box unless you have pretty sophisticated testing |
68 |
equipment and knowledge, so if your SATA host is messing things up how |
69 |
would you know? |
70 |
|
71 |
-- |
72 |
Rich |