1 |
Marc Joliet, mused, then expounded: |
2 |
> Am Wed, 28 May 2014 08:26:58 -0700 |
3 |
> schrieb Bob Sanders <rsanders@×××.com>: |
4 |
> |
5 |
> > |
6 |
> > Marc Joliet, mused, then expounded: |
7 |
> > > Am Tue, 27 May 2014 15:39:38 -0700 |
8 |
> > > schrieb Bob Sanders <rsanders@×××.com>: |
9 |
> > > |
10 |
> > > While I am far from a filesystem/storage expert (I see myself as a mere user), |
11 |
> > > the cited threads lead me to believe that this is most likely an |
12 |
> > > overhyped/misunderstood class of errors (e.g., posts [1] and [2]), so I would |
13 |
> > > suggest reading them in their entirety. |
14 |
> > > |
15 |
> > > [0] http://comments.gmane.org/gmane.comp.file-systems.btrfs/31832 |
16 |
> > > [1] http://permalink.gmane.org/gmane.comp.file-systems.btrfs/31871 |
17 |
> > > [2] http://permalink.gmane.org/gmane.comp.file-systems.btrfs/31877 |
18 |
> > > [3] http://comments.gmane.org/gmane.comp.file-systems.btrfs/31821 |
19 |
> > > |
20 |
> > |
21 |
> > FWIW - here's the FreeNAS ZFS ECC discussion on what happens with a bad |
22 |
> > memory bit and no ECC memory: |
23 |
> > |
24 |
|
25 |
Just to beat this dead horse some more, an analysis of a academic study |
26 |
on drive failures - |
27 |
|
28 |
http://storagemojo.com/2007/02/20/everything-you-know-about-disks-is-wrong/ |
29 |
|
30 |
And it links to the actual study here - |
31 |
|
32 |
https://www.usenix.org/legacy/events/fast07/tech/schroeder.html |
33 |
|
34 |
Which shows that memory has a fairly high failure rate as well, though |
35 |
the focus is on hard drives. |
36 |
|
37 |
> > http://forums.freenas.org/index.php?threads/ecc-vs-non-ecc-ram-and-zfs.15449/ |
38 |
> |
39 |
> Thanks for explicitly linking that. I didn't read it the first time around, |
40 |
> but just read through most of it, then reread the threads [0] and [3] above and |
41 |
> *think* that I understand the problem (and how it doesn't apply to BTRFS) |
42 |
> better now. |
43 |
> |
44 |
> IIUC, the claim is: data is written to disk, but it must go through the RAM |
45 |
> first, obviously, where it is corrupted (due to a permanent bit flip caused, |
46 |
> e.g., by deteriorating hardware). At some later point, when the data is read |
47 |
> back from disk, it might happen to load around the damaged location in RAM, |
48 |
> where it is further corrupted. At this point the checksum fails, and ZFS |
49 |
> corrects the data in RAM (using parity information!), where it is immediately |
50 |
> corrupted again (because apparently it is corrected at the same physical |
51 |
> location in RAM? perhaps this is specific to correction via parity?). This |
52 |
> *additionally* corrupted data is then written back to disk (without any further |
53 |
> checks). |
54 |
> |
55 |
> So the point is that, apparently, without ECC RAM, you could get a (long-term) |
56 |
> cascade of errors, especially during a scrub. The likelihood of such permanent |
57 |
> RAM corruption happening in the first place is another question entirely. |
58 |
> |
59 |
> The various posts in [0] then basically say that regardless of whether this |
60 |
> really is true of ZFS, it certainly doesn't apply to BTRFS, for various |
61 |
> reasons. I suppose this quote from [1] (see above) says it most clearly: |
62 |
> |
63 |
> > In hxxp://forums.freenas.org/threads/ecc-vs-non-ecc-ram-and-zfs.15449, they talk about |
64 |
> > reconstructing corrupted data from parity information: |
65 |
> > |
66 |
> > > Ok, no problem. ZFS will check against its parity. Oops, the parity failed since we have a new corrupted |
67 |
> > bit. Remember, the checksum data was calculated after the corruption from the first memory error |
68 |
> > occurred. So now the parity data is used to "repair" the bad data. So the data is "fixed" in RAM. |
69 |
> > |
70 |
> > i.e. that there is parity information stored with every piece of data, and ZFS will "correct" errors |
71 |
> > automatically from the parity information. I start to suspect that there is confusion here between |
72 |
> > checksumming for data integrity and parity information. If this is really how ZFS works, then if memory |
73 |
> > corruption interferes with this process, then I can see how a scrub could be devastating. I don't know if |
74 |
> > ZFS really works like this. It sounds very odd to do this without an additional checksum check. This sounds |
75 |
> > very different to what you say below that btrfs does, which is only to check against redundantly-stored |
76 |
> > copies, which I agree sounds much safer. |
77 |
> |
78 |
> The rest is also relevant, but I think the point that the data is corrected via |
79 |
> parity information, as opposed to using a known-good redundant copy of the data |
80 |
> (which I originally missed, and thus got confused), is the key point in |
81 |
> understanding the (supposed) difference in behaviour between ZFS and BTRFS. |
82 |
> |
83 |
> All this assumes, of course, that the FreeNAS forum post that ignited this |
84 |
> discussion is correct in the first place. |
85 |
> |
86 |
> > Thanks Mark! Interesting discussion on btrfs. |
87 |
> > |
88 |
> > Bob |
89 |
> |
90 |
> You're welcome! I agree, it's an interesting discussion. And regarding the |
91 |
> misspelling of my name: no problem :-) . |
92 |
> |
93 |
> -- |
94 |
> Marc Joliet |
95 |
> -- |
96 |
> "People who think they know everything really annoy those of us who know we |
97 |
> don't" - Bjarne Stroustrup |
98 |
|
99 |
|
100 |
|
101 |
-- |
102 |
- |