1 |
On Jan 9, 2013 2:06 AM, "Florian Philipp" <lists@×××××××××××.net> wrote: |
2 |
> |
3 |
> Am 08.01.2013 18:41, schrieb Pandu Poluan: |
4 |
> > |
5 |
> > On Jan 8, 2013 11:20 PM, "Florian Philipp" <lists@×××××××××××.net |
6 |
> > <mailto:lists@×××××××××××.net>> wrote: |
7 |
> >> |
8 |
> > |
9 |
> > -- snip -- |
10 |
> > |
11 |
> [...] |
12 |
> >> |
13 |
> >> When you have completely static content, md5sum, rsync and friends are |
14 |
> >> sufficient. But if you have content that changes from time to time, the |
15 |
> >> number of false-positives would be too high. In this case, I think you |
16 |
> >> could easily distinguish by comparing both file content and time |
17 |
stamps. |
18 |
> >> |
19 |
> [...] |
20 |
> > |
21 |
> > IMO, we're all barking up the wrong tree here... |
22 |
> > |
23 |
> > Before a file's content can change without user involvement, bit rot |
24 |
> > must first get through the checksum (CRC?) of the hard disk itself. |
25 |
> > There will be no 'gradual degradation of data', just 'catastrophic data |
26 |
> > loss'. |
27 |
> > |
28 |
> |
29 |
> Unfortunately, that's only partly true. Latent disk errors are a well |
30 |
> researched topic [1-3]. CRCs are not perfectly reliable. The trick is to |
31 |
> detect and correct errors while you still have valid backups or other |
32 |
> types of redundancy. |
33 |
> |
34 |
> The only way to do this is regular scrubbing. That's why professional |
35 |
> archival solutions offer some kind of self-healing which is usually just |
36 |
> the same as what I proposed (plus whatever on-access integrity checks |
37 |
> the platform supports) [4]. |
38 |
> |
39 |
> > I would rather focus my efforts on ensuring that my backups are always |
40 |
> > restorable, at least until the most recent time of archival. |
41 |
> > |
42 |
> |
43 |
> That's the point: |
44 |
> a) You have to detect when you have to restore from backup. |
45 |
> b) You have to verify that the backup itself is still valid. |
46 |
> c) You have to avoid situations where undetected errors creep into the |
47 |
> backup. |
48 |
> |
49 |
> I'm not talking about a purely theoretical possibility. I have |
50 |
> experienced just that: Some data that I have kept lying around for years |
51 |
> was corrupted. |
52 |
> |
53 |
> [1] Schwarz et.al: Disk Scrubbing in Large, Archival Storage Systems |
54 |
> http://www.cse.scu.edu/~tschwarz/Papers/mascots04.pdf |
55 |
> |
56 |
> [2] Baker et.al: A fresh look at the reliability of long-term digital |
57 |
> storage |
58 |
> http://arxiv.org/pdf/cs/0508130 |
59 |
> |
60 |
> [3] Bairavasundaram et.al: An Analysis of Latent Sector Errors in Disk |
61 |
> Drives |
62 |
> http://bnrg.eecs.berkeley.edu/~randy/Courses/CS294.F07/11.1.pdf |
63 |
> |
64 |
> [4] |
65 |
> |
66 |
http://uk.emc.com/collateral/analyst-reports/kci-evaluation-of-emc-centera.pdf |
67 |
> |
68 |
> Regards, |
69 |
> Florian Philipp |
70 |
> |
71 |
|
72 |
Interesting reads... thanks for the link! |
73 |
|
74 |
Hmm... if I'm in your position, I think this is what I'll do: |
75 |
|
76 |
1. Make a set of MD5 'checksums', one per file for ease of update. |
77 |
2. Compare the checksums with the actual files before opening a file. If |
78 |
mismatch, notify. |
79 |
3. When file handle is closed, recalculate. |
80 |
|
81 |
Protect the set of MD5 periodically using par2. |
82 |
|
83 |
Also protect your backups using par2, for that matter (that's what I always |
84 |
do when I archive something to optical media). |
85 |
|
86 |
Of course, you can outright use par2 to protect and ECC your data, but the |
87 |
time needed to generate the .par files *every time* would be too much, |
88 |
methinks... |
89 |
|
90 |
Rgds, |
91 |
-- |