Gentoo Archives: gentoo-user

From: Pandu Poluan <pandu@××××××.info>
To: gentoo-user@l.g.o
Subject: Re: [gentoo-user] OT: Fighting bit rot
Date: Wed, 09 Jan 2013 02:57:57
Message-Id: CAA2qdGUrCAB3cXWDhfbJC5t0OTjuaN6D-t6X0eC4rQj-1PrQWA@mail.gmail.com
In Reply to: Re: [gentoo-user] OT: Fighting bit rot by Florian Philipp
1 On Jan 9, 2013 2:06 AM, "Florian Philipp" <lists@×××××××××××.net> wrote:
2 >
3 > Am 08.01.2013 18:41, schrieb Pandu Poluan:
4 > >
5 > > On Jan 8, 2013 11:20 PM, "Florian Philipp" <lists@×××××××××××.net
6 > > <mailto:lists@×××××××××××.net>> wrote:
7 > >>
8 > >
9 > > -- snip --
10 > >
11 > [...]
12 > >>
13 > >> When you have completely static content, md5sum, rsync and friends are
14 > >> sufficient. But if you have content that changes from time to time, the
15 > >> number of false-positives would be too high. In this case, I think you
16 > >> could easily distinguish by comparing both file content and time
17 stamps.
18 > >>
19 > [...]
20 > >
21 > > IMO, we're all barking up the wrong tree here...
22 > >
23 > > Before a file's content can change without user involvement, bit rot
24 > > must first get through the checksum (CRC?) of the hard disk itself.
25 > > There will be no 'gradual degradation of data', just 'catastrophic data
26 > > loss'.
27 > >
28 >
29 > Unfortunately, that's only partly true. Latent disk errors are a well
30 > researched topic [1-3]. CRCs are not perfectly reliable. The trick is to
31 > detect and correct errors while you still have valid backups or other
32 > types of redundancy.
33 >
34 > The only way to do this is regular scrubbing. That's why professional
35 > archival solutions offer some kind of self-healing which is usually just
36 > the same as what I proposed (plus whatever on-access integrity checks
37 > the platform supports) [4].
38 >
39 > > I would rather focus my efforts on ensuring that my backups are always
40 > > restorable, at least until the most recent time of archival.
41 > >
42 >
43 > That's the point:
44 > a) You have to detect when you have to restore from backup.
45 > b) You have to verify that the backup itself is still valid.
46 > c) You have to avoid situations where undetected errors creep into the
47 > backup.
48 >
49 > I'm not talking about a purely theoretical possibility. I have
50 > experienced just that: Some data that I have kept lying around for years
51 > was corrupted.
52 >
53 > [1] Schwarz et.al: Disk Scrubbing in Large, Archival Storage Systems
54 > http://www.cse.scu.edu/~tschwarz/Papers/mascots04.pdf
55 >
56 > [2] Baker et.al: A fresh look at the reliability of long-term digital
57 > storage
58 > http://arxiv.org/pdf/cs/0508130
59 >
60 > [3] Bairavasundaram et.al: An Analysis of Latent Sector Errors in Disk
61 > Drives
62 > http://bnrg.eecs.berkeley.edu/~randy/Courses/CS294.F07/11.1.pdf
63 >
64 > [4]
65 >
66 http://uk.emc.com/collateral/analyst-reports/kci-evaluation-of-emc-centera.pdf
67 >
68 > Regards,
69 > Florian Philipp
70 >
71
72 Interesting reads... thanks for the link!
73
74 Hmm... if I'm in your position, I think this is what I'll do:
75
76 1. Make a set of MD5 'checksums', one per file for ease of update.
77 2. Compare the checksums with the actual files before opening a file. If
78 mismatch, notify.
79 3. When file handle is closed, recalculate.
80
81 Protect the set of MD5 periodically using par2.
82
83 Also protect your backups using par2, for that matter (that's what I always
84 do when I archive something to optical media).
85
86 Of course, you can outright use par2 to protect and ECC your data, but the
87 time needed to generate the .par files *every time* would be too much,
88 methinks...
89
90 Rgds,
91 --