1 |
On 31/7/21 9:30 pm, Rich Freeman wrote: |
2 |
> On Sat, Jul 31, 2021 at 8:59 AM William Kenworthy <billk@×××××××××.au> wrote: |
3 |
>> I tried using moosefs with a rpi3B in the |
4 |
>> mix and it didn't go well once I started adding data - rpi 4's were not |
5 |
>> available when I set it up. |
6 |
> Pi2/3s only have USB2 as far as I'm aware, and they stick the ethernet |
7 |
> port on that USB bus besides. So, they're terrible for anything that |
8 |
> involves IO of any kind. |
9 |
> |
10 |
> The Pi4 moves the ethernet off of USB, upgrades it to gigabit, and has |
11 |
> two USB3 hosts, so this is just all-around a missive improvement. |
12 |
> Obviously it isn't going to outclass some server-grade system with a |
13 |
> gazillion PCIe v4 lanes but it is very good for an SBC and the price. |
14 |
> |
15 |
> I'd love server-grade ARM hardware but it is just so expensive unless |
16 |
> there is some source out there I'm not aware of. It is crazy that you |
17 |
> can't get more than 4-8GiB of RAM on an affordable arm system. |
18 |
Checkout the odroid range. Same or only slightly $$$ more for a much |
19 |
better unit than a pi (except for the availability of 8G ram on the pi4) |
20 |
None of the pi's I have had have come close though I do not have a pi4 |
21 |
and that looks from reading to be much closer in performance. The |
22 |
Odroid sites includes comparison charts of odroid aganst the rpi and it |
23 |
also shows it getting closer in performance. There are a few other |
24 |
companies out there too. I am hoping the popularity of the pi 8G will |
25 |
push others to match it. I found the supplied 4.9 or 4.14 kernels |
26 |
problematic with random crashes, espicially if usb was involved. I am |
27 |
currently using the 5.12 tobetter kernels and aarch64 or arm32 bit |
28 |
gentoo userlands. |
29 |
> |
30 |
>> I think that SMR disks will work quite well |
31 |
>> on moosefs or lizardfs - I don't see long continuous writes to one disk |
32 |
>> but a random distribution of writes across the cluster with gaps between |
33 |
>> on each disk (1G network). |
34 |
> So, the distributed filesystems divide all IO (including writes) |
35 |
> across all the drives in the cluster. When you have a number of |
36 |
> drives that obviously increases the total amount of IO you can handle |
37 |
> before the SMR drives start hitting the wall. Writing 25GB of data to |
38 |
> a single SMR drive will probably overrun its CMR cache, but if you |
39 |
> split it across 10 drives and write 2.5GB each, there is a decent |
40 |
> chance they'll all have room in the cache, take the write quickly, and |
41 |
> then as long as your writes aren't sustained they can clear the |
42 |
> buffer. |
43 |
Not strictly what I am seeing. You request a file from MFS and the |
44 |
first first free chunkserver with the data replies. Writing is similar |
45 |
in that (depending on the creation arguments) a chunk is written |
46 |
wherever responds fastest then replicated. Replication is done under |
47 |
control of an algorithm that replicates a set number of chunks at a time |
48 |
between a limited number of chunkservers in a stream depending on |
49 |
replication status. So I am seeing individual disk activity that is |
50 |
busy for a few seconds, and then nothing for a short period - this |
51 |
pattern has become more pronounced as I added chunkservers and would |
52 |
seem to match the SMR requirements. If I replace/rebuild (resilver) a |
53 |
chunkserver, that one is a lot busier, but still not 100% continuous |
54 |
write or read. Moosefs uses a throttled replication methodology. This |
55 |
is with 7 chunkservers and 9 disks - more is definitely better for |
56 |
performance. |
57 |
> I think you're still going to have an issue in a rebalancing scenario |
58 |
> unless you're adding many drives at once so that the network becomes |
59 |
> rate-limiting instead of the disks. Having unreplicated data sitting |
60 |
> around for days or weeks due to slow replication performance is |
61 |
> setting yourself up for multiple failures. So, I'd still stay away |
62 |
> from them. |
63 |
I think at some point I am going to have to add an SMR disk and see what |
64 |
happens - cant do it now though. |
65 |
> |
66 |
> If you have 10GbE then your ability to overrun those disks goes way |
67 |
> up. Ditto if you're running something like Ceph which can achieve |
68 |
> higher performance. I'm just doing bulk storage where I care a lot |
69 |
> more about capacity than performance. If I were trying to run a k8s |
70 |
> cluster or something I'd be on Ceph on SSD or whatever. |
71 |
Tried ceph - run away fast :) I have a lot of nearly static data but |
72 |
also a number of lxc instances (running on an Odroid N2) with both the |
73 |
LXC instance and data stored on the cluster. These include email, |
74 |
calendaring, dns, webservers etc. all work well. The online borgbackup |
75 |
repos are also stored on it. Limitations on community moosefs is the |
76 |
single point of failure that is the master plus the memory resource |
77 |
requirements on the master. I improved performance and master memory |
78 |
requirements considerably by pushing the larger data sets (e.g., Gib of |
79 |
mail files) into a container file stored on MFS and loop mounted onto |
80 |
the mailserver lxc instance. Convoluted but very happy with the |
81 |
improvement its made. |
82 |
>> With a good adaptor, USB3 is great ... otherwise its been quite |
83 |
>> frustrating :( I do suspect linux and its pedantic correctness trying |
84 |
>> to deal with hardware that isn't truly standardised (as in the |
85 |
>> manufacturer probably supplies a windows driver that covers it up) is |
86 |
>> part of the problem. These adaptors are quite common and I needed to |
87 |
>> apply the ATA command filter and turn off UAS using the usb tweaks |
88 |
>> mechanism to stop the crashes and data corruption. The comments in the |
89 |
>> kernel driver code for these adaptors are illuminating! |
90 |
> Sometimes I wonder. I occasionally get errors in dmesg about |
91 |
> unaligned writes when using zfs. Others have seen these. The zfs |
92 |
> developers seem convinced that the issue isn't with zfs but it simply |
93 |
> is reporting the issue, or maybe it happens under loads that you're |
94 |
> more likely to get with zfs scrubbing (which IMO performs far worse |
95 |
> than with btrfs - I'm guessing it isn't optimized to scan physically |
96 |
> sequentially on each disk but may be doing it in a more logical order |
97 |
> and synchronously between mirror pairs). Sometimes I wonder if there |
98 |
> is just some sort of bug in the HBA drivers, or maybe the hardware on |
99 |
> the motherboard. Consumer PC hardware (like all PC hardware) is |
100 |
> basically a black box unless you have pretty sophisticated testing |
101 |
> equipment and knowledge, so if your SATA host is messing things up how |
102 |
> would you know? |
103 |
> |
104 |
BillK |