Gentoo Archives: gentoo-amd64

From: Marc Joliet <marcec@×××.de>
To: gentoo-amd64@l.g.o
Subject: Re: [gentoo-amd64] Soliciting new RAID ideas
Date: Wed, 28 May 2014 19:20:39
Message-Id: 20140528212018.04387c61@marcec
In Reply to: Re: [gentoo-amd64] Soliciting new RAID ideas by Bob Sanders
1 Am Wed, 28 May 2014 08:26:58 -0700
2 schrieb Bob Sanders <rsanders@×××.com>:
3
4 >
5 > Marc Joliet, mused, then expounded:
6 > > Am Tue, 27 May 2014 15:39:38 -0700
7 > > schrieb Bob Sanders <rsanders@×××.com>:
8 > >
9 > > While I am far from a filesystem/storage expert (I see myself as a mere user),
10 > > the cited threads lead me to believe that this is most likely an
11 > > overhyped/misunderstood class of errors (e.g., posts [1] and [2]), so I would
12 > > suggest reading them in their entirety.
13 > >
14 > > [0] http://comments.gmane.org/gmane.comp.file-systems.btrfs/31832
15 > > [1] http://permalink.gmane.org/gmane.comp.file-systems.btrfs/31871
16 > > [2] http://permalink.gmane.org/gmane.comp.file-systems.btrfs/31877
17 > > [3] http://comments.gmane.org/gmane.comp.file-systems.btrfs/31821
18 > >
19 >
20 > FWIW - here's the FreeNAS ZFS ECC discussion on what happens with a bad
21 > memory bit and no ECC memory:
22 >
23 > http://forums.freenas.org/index.php?threads/ecc-vs-non-ecc-ram-and-zfs.15449/
24
25 Thanks for explicitly linking that. I didn't read it the first time around,
26 but just read through most of it, then reread the threads [0] and [3] above and
27 *think* that I understand the problem (and how it doesn't apply to BTRFS)
28 better now.
29
30 IIUC, the claim is: data is written to disk, but it must go through the RAM
31 first, obviously, where it is corrupted (due to a permanent bit flip caused,
32 e.g., by deteriorating hardware). At some later point, when the data is read
33 back from disk, it might happen to load around the damaged location in RAM,
34 where it is further corrupted. At this point the checksum fails, and ZFS
35 corrects the data in RAM (using parity information!), where it is immediately
36 corrupted again (because apparently it is corrected at the same physical
37 location in RAM? perhaps this is specific to correction via parity?). This
38 *additionally* corrupted data is then written back to disk (without any further
39 checks).
40
41 So the point is that, apparently, without ECC RAM, you could get a (long-term)
42 cascade of errors, especially during a scrub. The likelihood of such permanent
43 RAM corruption happening in the first place is another question entirely.
44
45 The various posts in [0] then basically say that regardless of whether this
46 really is true of ZFS, it certainly doesn't apply to BTRFS, for various
47 reasons. I suppose this quote from [1] (see above) says it most clearly:
48
49 > In hxxp://forums.freenas.org/threads/ecc-vs-non-ecc-ram-and-zfs.15449, they talk about
50 > reconstructing corrupted data from parity information:
51 >
52 > > Ok, no problem. ZFS will check against its parity. Oops, the parity failed since we have a new corrupted
53 > bit. Remember, the checksum data was calculated after the corruption from the first memory error
54 > occurred. So now the parity data is used to "repair" the bad data. So the data is "fixed" in RAM.
55 >
56 > i.e. that there is parity information stored with every piece of data, and ZFS will "correct" errors
57 > automatically from the parity information. I start to suspect that there is confusion here between
58 > checksumming for data integrity and parity information. If this is really how ZFS works, then if memory
59 > corruption interferes with this process, then I can see how a scrub could be devastating. I don't know if
60 > ZFS really works like this. It sounds very odd to do this without an additional checksum check. This sounds
61 > very different to what you say below that btrfs does, which is only to check against redundantly-stored
62 > copies, which I agree sounds much safer.
63
64 The rest is also relevant, but I think the point that the data is corrected via
65 parity information, as opposed to using a known-good redundant copy of the data
66 (which I originally missed, and thus got confused), is the key point in
67 understanding the (supposed) difference in behaviour between ZFS and BTRFS.
68
69 All this assumes, of course, that the FreeNAS forum post that ignited this
70 discussion is correct in the first place.
71
72 > Thanks Mark! Interesting discussion on btrfs.
73 >
74 > Bob
75
76 You're welcome! I agree, it's an interesting discussion. And regarding the
77 misspelling of my name: no problem :-) .
78
79 --
80 Marc Joliet
81 --
82 "People who think they know everything really annoy those of us who know we
83 don't" - Bjarne Stroustrup

Attachments

File name MIME type
signature.asc application/pgp-signature

Replies

Subject Author
Re: [gentoo-amd64] Soliciting new RAID ideas Bob Sanders <rsanders@×××.com>
[gentoo-amd64] Re: Soliciting new RAID ideas Duncan <1i5t5.duncan@×××.net>