1 |
On Wednesday 27 February 2008, Stroller wrote: |
2 |
|
3 |
> > Of course, this does not detect a succesful, but somehow corrupted, |
4 |
> > copy |
5 |
> > (which should be exceptionally rare, anyway). |
6 |
> |
7 |
> Well perhaps I'm just being paranoid today. |
8 |
> But how do I know that a successful, but somehow corrupted, copy has |
9 |
> not occurred? |
10 |
> |
11 |
> What makes you confident that these are rare? I don't ask this to be |
12 |
> antagonistic, just to increase my own confidence in the `cp` command. |
13 |
|
14 |
Ah well, I have no statistics here. But I can say that such a thing has |
15 |
never occured to me in the past (or at least if it occured, I did not |
16 |
notice that). Not a definitive proof, I know; rather, just my |
17 |
experience. You are of course free to not trust me and, if you're truly |
18 |
paranoid, you probably should do so :-) |
19 |
|
20 |
> I have to admit that I haven't run this command and I don't have any |
21 |
> idea what its actual resource usage would be. I guess I'd be happy |
22 |
> with a lower-grade of checksumming, if it would reduce the runtime to |
23 |
> acceptable levels. With md5sum one can be - barring certain malicious |
24 |
> external attacks - quite certain that a copied file is identical to |
25 |
> the original. I would be happy with a "the file's there and it looks |
26 |
> ok" level of confidence. |
27 |
|
28 |
Well, md5deep has already been suggested. If you are content with a |
29 |
lower-grade checksumming, you could write your own script that compares |
30 |
file lenghts and calculate checksums only on the first n and last m |
31 |
bytes of each file, for some reasonable values of n and m (bigger is |
32 |
better, as you guess). This is what backuppc (an excellent backup |
33 |
software) does when it has to decide whether a file has changed (and |
34 |
thus has to be backed up) compared with the copy stored in the backup |
35 |
pool. |
36 |
Read this for more info: |
37 |
|
38 |
http://backuppc.sourceforge.net/faq/BackupPC.html#some_design_issues |
39 |
|
40 |
"The hashing function" paragraph. Do note that (of course) that method is |
41 |
not 100% accurate and might report false negatives if the corruption is |
42 |
in the middle of the file and file length did not change. |
43 |
-- |
44 |
gentoo-user@l.g.o mailing list |