Gentoo Archives: gentoo-amd64

From: Rich Freeman <rich0@g.o>
To: gentoo-amd64@l.g.o
Subject: Re: [gentoo-amd64] Soliciting new RAID ideas
Date: Wed, 28 May 2014 16:10:19
Message-Id: CAGfcS_kv_BaOYd5K0Zp09Kc3kaWJR=q2jMFLZyt-k7Qep3VmCg@mail.gmail.com
In Reply to: Re: [gentoo-amd64] Soliciting new RAID ideas by Bob Sanders
1 On Wed, May 28, 2014 at 11:26 AM, Bob Sanders <rsanders@×××.com> wrote:
2 > Marc Joliet, mused, then expounded:
3 >> Am Tue, 27 May 2014 15:39:38 -0700
4 >> schrieb Bob Sanders <rsanders@×××.com>:
5 >>
6 >> While I am far from a filesystem/storage expert (I see myself as a mere user),
7 >> the cited threads lead me to believe that this is most likely an
8 >> overhyped/misunderstood class of errors (e.g., posts [1] and [2]), so I would
9 >> suggest reading them in their entirety.
10 >>
11 >> [0] http://comments.gmane.org/gmane.comp.file-systems.btrfs/31832
12 >> [1] http://permalink.gmane.org/gmane.comp.file-systems.btrfs/31871
13 >> [2] http://permalink.gmane.org/gmane.comp.file-systems.btrfs/31877
14 >> [3] http://comments.gmane.org/gmane.comp.file-systems.btrfs/31821
15 >>
16 >
17 > FWIW - here's the FreeNAS ZFS ECC discussion on what happens with a bad
18 > memory bit and no ECC memory:
19 >
20 > http://forums.freenas.org/index.php?threads/ecc-vs-non-ecc-ram-and-zfs.15449/
21 >
22
23 I don't think that anybody debates that if you use btrfs/zfs with
24 non-ECC RAM you can potentially lose some of the protection afforded
25 by the checksumming.
26
27 What I'd question is that this is some concern unique to btrfs/zfs.
28 I'd think the same failure modes would all apply to any other
29 filesystem.
30
31 So, the message should be that ECC RAM is better than non-ECC RAM, not
32 that those who use non-ECC RAM are better off using ext4 instead of
33 zfs/btrfs. I'd think that any RAM-related issue that would impact
34 zfs/btrfs would affect ext4 just as badly, and with ext4 you're also
35 vulnerable to all the non-RAM-related errors that checksumming was
36 created to solve.
37
38 If your RAM is bad then all kinds of stuff can go wrong. Ditto for
39 your cache memory in the CPU, logic circuitry in the CPU, your busses,
40 etc. Most systems are not fault-tolerant of these system components
41 and the cost to make them fault-tolerant tends to be fairly high. On
42 the other hand, the good news is that you're far more likely to have
43 problems with data stored on a disk than in RAM, which is probably why
44 we haven't bothered to improve the other components.
45
46 Rich