Gentoo Archives: gentoo-user

From: Etaoin Shrdlu <shrdlu@×××××××××××××.org>
To: gentoo-user@l.g.o
Subject: Re: [gentoo-user] md5sum for directories?
Date: Wed, 27 Feb 2008 09:28:34
Message-Id: 200802271040.20723.shrdlu@unlimitedmail.org
In Reply to: Re: [gentoo-user] md5sum for directories? by Stroller
1 On Wednesday 27 February 2008, Stroller wrote:
2
3 > > Of course, this does not detect a succesful, but somehow corrupted,
4 > > copy
5 > > (which should be exceptionally rare, anyway).
6 >
7 > Well perhaps I'm just being paranoid today.
8 > But how do I know that a successful, but somehow corrupted, copy has
9 > not occurred?
10 >
11 > What makes you confident that these are rare? I don't ask this to be
12 > antagonistic, just to increase my own confidence in the `cp` command.
13
14 Ah well, I have no statistics here. But I can say that such a thing has
15 never occured to me in the past (or at least if it occured, I did not
16 notice that). Not a definitive proof, I know; rather, just my
17 experience. You are of course free to not trust me and, if you're truly
18 paranoid, you probably should do so :-)
19
20 > I have to admit that I haven't run this command and I don't have any
21 > idea what its actual resource usage would be. I guess I'd be happy
22 > with a lower-grade of checksumming, if it would reduce the runtime to
23 > acceptable levels. With md5sum one can be - barring certain malicious
24 > external attacks - quite certain that a copied file is identical to
25 > the original. I would be happy with a "the file's there and it looks
26 > ok" level of confidence.
27
28 Well, md5deep has already been suggested. If you are content with a
29 lower-grade checksumming, you could write your own script that compares
30 file lenghts and calculate checksums only on the first n and last m
31 bytes of each file, for some reasonable values of n and m (bigger is
32 better, as you guess). This is what backuppc (an excellent backup
33 software) does when it has to decide whether a file has changed (and
34 thus has to be backed up) compared with the copy stored in the backup
35 pool.
36 Read this for more info:
37
38 http://backuppc.sourceforge.net/faq/BackupPC.html#some_design_issues
39
40 "The hashing function" paragraph. Do note that (of course) that method is
41 not 100% accurate and might report false negatives if the corruption is
42 in the middle of the file and file length did not change.
43 --
44 gentoo-user@l.g.o mailing list