1 |
Am Wed, 28 May 2014 08:26:58 -0700 |
2 |
schrieb Bob Sanders <rsanders@×××.com>: |
3 |
|
4 |
> |
5 |
> Marc Joliet, mused, then expounded: |
6 |
> > Am Tue, 27 May 2014 15:39:38 -0700 |
7 |
> > schrieb Bob Sanders <rsanders@×××.com>: |
8 |
> > |
9 |
> > While I am far from a filesystem/storage expert (I see myself as a mere user), |
10 |
> > the cited threads lead me to believe that this is most likely an |
11 |
> > overhyped/misunderstood class of errors (e.g., posts [1] and [2]), so I would |
12 |
> > suggest reading them in their entirety. |
13 |
> > |
14 |
> > [0] http://comments.gmane.org/gmane.comp.file-systems.btrfs/31832 |
15 |
> > [1] http://permalink.gmane.org/gmane.comp.file-systems.btrfs/31871 |
16 |
> > [2] http://permalink.gmane.org/gmane.comp.file-systems.btrfs/31877 |
17 |
> > [3] http://comments.gmane.org/gmane.comp.file-systems.btrfs/31821 |
18 |
> > |
19 |
> |
20 |
> FWIW - here's the FreeNAS ZFS ECC discussion on what happens with a bad |
21 |
> memory bit and no ECC memory: |
22 |
> |
23 |
> http://forums.freenas.org/index.php?threads/ecc-vs-non-ecc-ram-and-zfs.15449/ |
24 |
|
25 |
Thanks for explicitly linking that. I didn't read it the first time around, |
26 |
but just read through most of it, then reread the threads [0] and [3] above and |
27 |
*think* that I understand the problem (and how it doesn't apply to BTRFS) |
28 |
better now. |
29 |
|
30 |
IIUC, the claim is: data is written to disk, but it must go through the RAM |
31 |
first, obviously, where it is corrupted (due to a permanent bit flip caused, |
32 |
e.g., by deteriorating hardware). At some later point, when the data is read |
33 |
back from disk, it might happen to load around the damaged location in RAM, |
34 |
where it is further corrupted. At this point the checksum fails, and ZFS |
35 |
corrects the data in RAM (using parity information!), where it is immediately |
36 |
corrupted again (because apparently it is corrected at the same physical |
37 |
location in RAM? perhaps this is specific to correction via parity?). This |
38 |
*additionally* corrupted data is then written back to disk (without any further |
39 |
checks). |
40 |
|
41 |
So the point is that, apparently, without ECC RAM, you could get a (long-term) |
42 |
cascade of errors, especially during a scrub. The likelihood of such permanent |
43 |
RAM corruption happening in the first place is another question entirely. |
44 |
|
45 |
The various posts in [0] then basically say that regardless of whether this |
46 |
really is true of ZFS, it certainly doesn't apply to BTRFS, for various |
47 |
reasons. I suppose this quote from [1] (see above) says it most clearly: |
48 |
|
49 |
> In hxxp://forums.freenas.org/threads/ecc-vs-non-ecc-ram-and-zfs.15449, they talk about |
50 |
> reconstructing corrupted data from parity information: |
51 |
> |
52 |
> > Ok, no problem. ZFS will check against its parity. Oops, the parity failed since we have a new corrupted |
53 |
> bit. Remember, the checksum data was calculated after the corruption from the first memory error |
54 |
> occurred. So now the parity data is used to "repair" the bad data. So the data is "fixed" in RAM. |
55 |
> |
56 |
> i.e. that there is parity information stored with every piece of data, and ZFS will "correct" errors |
57 |
> automatically from the parity information. I start to suspect that there is confusion here between |
58 |
> checksumming for data integrity and parity information. If this is really how ZFS works, then if memory |
59 |
> corruption interferes with this process, then I can see how a scrub could be devastating. I don't know if |
60 |
> ZFS really works like this. It sounds very odd to do this without an additional checksum check. This sounds |
61 |
> very different to what you say below that btrfs does, which is only to check against redundantly-stored |
62 |
> copies, which I agree sounds much safer. |
63 |
|
64 |
The rest is also relevant, but I think the point that the data is corrected via |
65 |
parity information, as opposed to using a known-good redundant copy of the data |
66 |
(which I originally missed, and thus got confused), is the key point in |
67 |
understanding the (supposed) difference in behaviour between ZFS and BTRFS. |
68 |
|
69 |
All this assumes, of course, that the FreeNAS forum post that ignited this |
70 |
discussion is correct in the first place. |
71 |
|
72 |
> Thanks Mark! Interesting discussion on btrfs. |
73 |
> |
74 |
> Bob |
75 |
|
76 |
You're welcome! I agree, it's an interesting discussion. And regarding the |
77 |
misspelling of my name: no problem :-) . |
78 |
|
79 |
-- |
80 |
Marc Joliet |
81 |
-- |
82 |
"People who think they know everything really annoy those of us who know we |
83 |
don't" - Bjarne Stroustrup |