Gentoo Archives: gentoo-amd64

From:	Marc Joliet <marcec@×××.de>
To:	gentoo-amd64@l.g.o
Subject:	Re: [gentoo-amd64] Soliciting new RAID ideas
Date:	Wed, 28 May 2014 19:20:39
Message-Id:	`20140528212018.04387c61@marcec`
In Reply to:	Re: [gentoo-amd64] Soliciting new RAID ideas by Bob Sanders

1	Am Wed, 28 May 2014 08:26:58 -0700
2	schrieb Bob Sanders <rsanders@×××.com>:
3
4	>
5	> Marc Joliet, mused, then expounded:
6	> > Am Tue, 27 May 2014 15:39:38 -0700
7	> > schrieb Bob Sanders <rsanders@×××.com>:
8	> >
9	> > While I am far from a filesystem/storage expert (I see myself as a mere user),
10	> > the cited threads lead me to believe that this is most likely an
11	> > overhyped/misunderstood class of errors (e.g., posts [1] and [2]), so I would
12	> > suggest reading them in their entirety.
13	> >
14	> > [0] http://comments.gmane.org/gmane.comp.file-systems.btrfs/31832
15	> > [1] http://permalink.gmane.org/gmane.comp.file-systems.btrfs/31871
16	> > [2] http://permalink.gmane.org/gmane.comp.file-systems.btrfs/31877
17	> > [3] http://comments.gmane.org/gmane.comp.file-systems.btrfs/31821
18	> >
19	>
20	> FWIW - here's the FreeNAS ZFS ECC discussion on what happens with a bad
21	> memory bit and no ECC memory:
22	>
23	> http://forums.freenas.org/index.php?threads/ecc-vs-non-ecc-ram-and-zfs.15449/
24
25	Thanks for explicitly linking that. I didn't read it the first time around,
26	but just read through most of it, then reread the threads [0] and [3] above and
27	think that I understand the problem (and how it doesn't apply to BTRFS)
28	better now.
29
30	IIUC, the claim is: data is written to disk, but it must go through the RAM
31	first, obviously, where it is corrupted (due to a permanent bit flip caused,
32	e.g., by deteriorating hardware). At some later point, when the data is read
33	back from disk, it might happen to load around the damaged location in RAM,
34	where it is further corrupted. At this point the checksum fails, and ZFS
35	corrects the data in RAM (using parity information!), where it is immediately
36	corrupted again (because apparently it is corrected at the same physical
37	location in RAM? perhaps this is specific to correction via parity?). This
38	additionally corrupted data is then written back to disk (without any further
39	checks).
40
41	So the point is that, apparently, without ECC RAM, you could get a (long-term)
42	cascade of errors, especially during a scrub. The likelihood of such permanent
43	RAM corruption happening in the first place is another question entirely.
44
45	The various posts in [0] then basically say that regardless of whether this
46	really is true of ZFS, it certainly doesn't apply to BTRFS, for various
47	reasons. I suppose this quote from [1] (see above) says it most clearly:
48
49	> In hxxp://forums.freenas.org/threads/ecc-vs-non-ecc-ram-and-zfs.15449, they talk about
50	> reconstructing corrupted data from parity information:
51	>
52	> > Ok, no problem. ZFS will check against its parity. Oops, the parity failed since we have a new corrupted
53	> bit. Remember, the checksum data was calculated after the corruption from the first memory error
54	> occurred. So now the parity data is used to "repair" the bad data. So the data is "fixed" in RAM.
55	>
56	> i.e. that there is parity information stored with every piece of data, and ZFS will "correct" errors
57	> automatically from the parity information. I start to suspect that there is confusion here between
58	> checksumming for data integrity and parity information. If this is really how ZFS works, then if memory
59	> corruption interferes with this process, then I can see how a scrub could be devastating. I don't know if
60	> ZFS really works like this. It sounds very odd to do this without an additional checksum check. This sounds
61	> very different to what you say below that btrfs does, which is only to check against redundantly-stored
62	> copies, which I agree sounds much safer.
63
64	The rest is also relevant, but I think the point that the data is corrected via
65	parity information, as opposed to using a known-good redundant copy of the data
66	(which I originally missed, and thus got confused), is the key point in
67	understanding the (supposed) difference in behaviour between ZFS and BTRFS.
68
69	All this assumes, of course, that the FreeNAS forum post that ignited this
70	discussion is correct in the first place.
71
72	> Thanks Mark! Interesting discussion on btrfs.
73	>
74	> Bob
75
76	You're welcome! I agree, it's an interesting discussion. And regarding the
77	misspelling of my name: no problem :-) .
78
79	--
80	Marc Joliet
81	--
82	"People who think they know everything really annoy those of us who know we
83	don't" - Bjarne Stroustrup

Attachments

File name	MIME type
signature.asc	application/pgp-signature

Replies

Subject	Author
Re: [gentoo-amd64] Soliciting new RAID ideas	Bob Sanders <rsanders@×××.com>
[gentoo-amd64] Re: Soliciting new RAID ideas	Duncan <1i5t5.duncan@×××.net>

Report Message

Find on MARC Find on Google Groups