Gentoo Archives: gentoo-user

From:	Pandu Poluan <pandu@××××××.info>
To:	gentoo-user@l.g.o
Subject:	Re: [gentoo-user] OT: Fighting bit rot
Date:	Wed, 09 Jan 2013 02:57:57
Message-Id:	`CAA2qdGUrCAB3cXWDhfbJC5t0OTjuaN6D-t6X0eC4rQj-1PrQWA@mail.gmail.com`
In Reply to:	Re: [gentoo-user] OT: Fighting bit rot by Florian Philipp

1	On Jan 9, 2013 2:06 AM, "Florian Philipp" <lists@×××××××××××.net> wrote:
2	>
3	> Am 08.01.2013 18:41, schrieb Pandu Poluan:
4	> >
5	> > On Jan 8, 2013 11:20 PM, "Florian Philipp" <lists@×××××××××××.net
6	> > <mailto:lists@×××××××××××.net>> wrote:
7	> >>
8	> >
9	> > -- snip --
10	> >
11	> [...]
12	> >>
13	> >> When you have completely static content, md5sum, rsync and friends are
14	> >> sufficient. But if you have content that changes from time to time, the
15	> >> number of false-positives would be too high. In this case, I think you
16	> >> could easily distinguish by comparing both file content and time
17	stamps.
18	> >>
19	> [...]
20	> >
21	> > IMO, we're all barking up the wrong tree here...
22	> >
23	> > Before a file's content can change without user involvement, bit rot
24	> > must first get through the checksum (CRC?) of the hard disk itself.
25	> > There will be no 'gradual degradation of data', just 'catastrophic data
26	> > loss'.
27	> >
28	>
29	> Unfortunately, that's only partly true. Latent disk errors are a well
30	> researched topic [1-3]. CRCs are not perfectly reliable. The trick is to
31	> detect and correct errors while you still have valid backups or other
32	> types of redundancy.
33	>
34	> The only way to do this is regular scrubbing. That's why professional
35	> archival solutions offer some kind of self-healing which is usually just
36	> the same as what I proposed (plus whatever on-access integrity checks
37	> the platform supports) [4].
38	>
39	> > I would rather focus my efforts on ensuring that my backups are always
40	> > restorable, at least until the most recent time of archival.
41	> >
42	>
43	> That's the point:
44	> a) You have to detect when you have to restore from backup.
45	> b) You have to verify that the backup itself is still valid.
46	> c) You have to avoid situations where undetected errors creep into the
47	> backup.
48	>
49	> I'm not talking about a purely theoretical possibility. I have
50	> experienced just that: Some data that I have kept lying around for years
51	> was corrupted.
52	>
53	> [1] Schwarz et.al: Disk Scrubbing in Large, Archival Storage Systems
54	> http://www.cse.scu.edu/~tschwarz/Papers/mascots04.pdf
55	>
56	> [2] Baker et.al: A fresh look at the reliability of long-term digital
57	> storage
58	> http://arxiv.org/pdf/cs/0508130
59	>
60	> [3] Bairavasundaram et.al: An Analysis of Latent Sector Errors in Disk
61	> Drives
62	> http://bnrg.eecs.berkeley.edu/~randy/Courses/CS294.F07/11.1.pdf
63	>
64	> [4]
65	>
66	http://uk.emc.com/collateral/analyst-reports/kci-evaluation-of-emc-centera.pdf
67	>
68	> Regards,
69	> Florian Philipp
70	>
71
72	Interesting reads... thanks for the link!
73
74	Hmm... if I'm in your position, I think this is what I'll do:
75
76	1. Make a set of MD5 'checksums', one per file for ease of update.
77	2. Compare the checksums with the actual files before opening a file. If
78	mismatch, notify.
79	3. When file handle is closed, recalculate.
80
81	Protect the set of MD5 periodically using par2.
82
83	Also protect your backups using par2, for that matter (that's what I always
84	do when I archive something to optical media).
85
86	Of course, you can outright use par2 to protect and ECC your data, but the
87	time needed to generate the .par files every time would be too much,
88	methinks...
89
90	Rgds,
91	--

Report Message

Find on MARC Find on Google Groups