1 |
On Wed, Mar 18, 2020 at 9:49 AM antlists <antlists@××××××××××××.uk> wrote: |
2 |
> |
3 |
> On 17/03/2020 14:29, Grant Edwards wrote: |
4 |
> > On 2020-03-17, Neil Bothwick <neil@××××××××××.uk> wrote: |
5 |
> > |
6 |
> >> Same here. The main advantage of spinning HDs are that they are cheaper |
7 |
> >> to replace when they fail. I only use them when I need lots of space. |
8 |
> > |
9 |
> > Me too. If I didn't have my desktop set up as a DVR with 5TB of |
10 |
> > recording space, I wouldn't have any spinning drives at all. My |
11 |
> > personal experience so far indicates that SSDs are far more reliable |
12 |
> > and long-lived than spinning HDs. I would guess that about half of my |
13 |
> > spinning HDs fail in under 5 years. But then again, I tend to buy |
14 |
> > pretty cheap models. |
15 |
> > |
16 |
> If you rely on raid, and use spinning rust, DON'T buy cheap drives. I |
17 |
> like Seagate, and bought myself Barracudas. Big mistake. Next time |
18 |
> round, I bought Ironwolves. Hopefully that system will soon be up and |
19 |
> running, and I'll see whether that was a good choice :-) |
20 |
|
21 |
Can you elaborate on what the mistake was? Backblaze hasn't found |
22 |
Seagate to really be any better/worse than anything else. It seems |
23 |
like every vendor has a really bad model every couple of years. Maybe |
24 |
the more expensive drive will last longer, but you're paying a hefty |
25 |
premium. It might be cheaper to just get three drives with 3x |
26 |
redundancy than two super-expensive ones with 2x redundancy. |
27 |
|
28 |
The main issues I've seen with RAID are: |
29 |
|
30 |
1. Double failures. If your RAID doesn't accommodate double failures |
31 |
(RAID6/etc) then you have to consider the time required to replace a |
32 |
drive and rebuild the array. As arrays get large or if you aren't |
33 |
super-quick with replacements then you have more risk of double |
34 |
failures. Maybe you could mitigate that with drives that are less |
35 |
likely to fail at the same time, but I suspect you're better off |
36 |
having enough redundancy to deal with the problem. |
37 |
|
38 |
2. Drive fails and the system becomes unstable/etc. This is usually |
39 |
a controller problem, and is probably less likely for better |
40 |
controllers. It could also be a kernel issue if the |
41 |
driver/failesystem/etc doesn't handle the erroneous data. I think the |
42 |
only place you could impact this risk is with the controller, not the |
43 |
drive. If the drive sends garbage over the interface then the |
44 |
controller should not pass along invalid data or allow that to |
45 |
interface with functioning drives. |
46 |
|
47 |
This is one of the reasons that I've been trying to move towards |
48 |
lizardfs or other distributed filesystems. This puts the redundancy |
49 |
at the host level. I can lose all the drives on a host, the host, its |
50 |
controller, its power supply, or whatever, and nothing bad happens. |
51 |
Typically in these systems drives aren't explicitly paired but data is |
52 |
just generally pooled, so if data is lost the entire cluster starts |
53 |
replicating it to return redundancy, and that rebuild gets split |
54 |
across all hosts and starts instantly and not after you add a drive |
55 |
unless you were running near-full. One host replicating one 12TB |
56 |
drive takes a lot longer than 10 hosts replicating 1.2TB each to |
57 |
another host in parallel as long as your network switches can run at |
58 |
full network capacity per host at the same time and you have no |
59 |
bottlenecks. |
60 |
|
61 |
-- |
62 |
Rich |