1 |
Am 04.05.2020 um 02:46 schrieb Rich Freeman: |
2 |
> On Sun, May 3, 2020 at 6:50 PM hitachi303 |
3 |
> <gentoo-user@××××××××××××××××.de> wrote: |
4 |
>> |
5 |
>> The only person I know who is running a really huge raid ( I guess 2000+ |
6 |
>> drives) is comfortable with some spare drives. His raid did fail an can |
7 |
>> fail. Data will be lost. Everything important has to be stored at a |
8 |
>> secondary location. But they are using the raid to store data for some |
9 |
>> days or weeks when a server is calculating stuff. If the raid fails they |
10 |
>> have to restart the program for the calculation. |
11 |
> |
12 |
> So, if you have thousands of drives, you really shouldn't be using a |
13 |
> conventional RAID solution. Now, if you're just using RAID to refer |
14 |
> to any technology that stores data redundantly that is one thing. |
15 |
> However, if you wanted to stick 2000 drives into a single host using |
16 |
> something like mdadm/zfs, or heaven forbid a bazillion LSI HBAs with |
17 |
> some kind of hacked-up solution for PCIe port replication plus SATA |
18 |
> bus multipliers/etc, you're probably doing it wrong. (Really even |
19 |
> with mdadm/zfs you probably still need some kind of terribly |
20 |
> non-optimal solution for attaching all those drives to a single host.) |
21 |
> |
22 |
> At that scale you really should be using a distributed filesystem. Or |
23 |
> you could use some application-level solution that accomplishes the |
24 |
> same thing on top of a bunch of more modest hosts running zfs/etc (the |
25 |
> Backblaze solution at least in the past). |
26 |
> |
27 |
> The most mainstream FOSS solution at this scale is Ceph. It achieves |
28 |
> redundancy at the host level. That is, if you have it set up to |
29 |
> tolerate two failures then you can take two random hosts in the |
30 |
> cluster and smash their motherboards with a hammer in the middle of |
31 |
> operation, and the cluster will keep on working and quickly restore |
32 |
> its redundancy. Each host can have multiple drives, and losing any or |
33 |
> all of the drives within a single host counts as a single failure. |
34 |
> You can even do clever stuff like tell it which hosts are attached to |
35 |
> which circuit breakers and then you could lose all the hosts on a |
36 |
> single power circuit at once and it would be fine. |
37 |
> |
38 |
> This also has the benefit of covering you when one of your flakey |
39 |
> drives causes weird bus issues that affect other drives, or one host |
40 |
> crashes, and so on. The redundancy is entirely at the host level so |
41 |
> you're protected against a much larger number of failure modes. |
42 |
> |
43 |
> This sort of solution also performs much faster as data requests are |
44 |
> not CPU/NIC/HBA limited for any particular host. The software is |
45 |
> obviously more complex, but the hardware can be simpler since if you |
46 |
> want to expand storage you just buy more servers and plug them into |
47 |
> the LAN, versus trying to figure out how to cram an extra dozen hard |
48 |
> drives into a single host with all kinds of port multiplier games. |
49 |
> You can also do maintenance and just reboot an entire host while the |
50 |
> cluster stays online as long as you aren't messing with them all at |
51 |
> once. |
52 |
> |
53 |
> I've gone in this general direction because I was tired of having to |
54 |
> try to deal with massive cases, being limited to motherboards with 6 |
55 |
> SATA ports, adding LSI HBAs that require an 8x slot and often |
56 |
> conflicts with using an NVMe, and so on. |
57 |
|
58 |
|
59 |
So you are right. This is the way they do it. I used the term raid to |
60 |
broadly. |
61 |
But still they have problems with limitations. Size of room, what air |
62 |
conditioning can handle and stuff like this. |
63 |
|
64 |
Anyway I only wanted to point out that there are different approaches in |
65 |
the industries and saving the data at any price is not always necessary. |