1 |
On 22 Jun 2003, bdharring <bdharring@××××.edu> wrote: |
2 |
|
3 |
> >The uncompressed form is the natural and efficient place to do delta |
4 |
> >compression. |
5 |
|
6 |
> Agreed, although I would posit that decompressing a large bzip2 for |
7 |
> md5suming in memory makes it a substantially longer affair then if |
8 |
> you just md5'd the compressed tarball. On my personal system, |
9 |
> compressed=>3-5s, bzip2 decompressing piped to md5 = 1-2 minutes. |
10 |
> More below... |
11 |
|
12 |
Yes, if the user is downloading a compressed form, then it makes sense |
13 |
to calculate the hash of the compressed form when checking if |
14 |
e.g. they got an interrupted or corrupt download. |
15 |
|
16 |
But aside from that, including the time to decompress as a cost of |
17 |
checking the MD5 sum is a furphy. It has to be decompressed at some |
18 |
point whether to patch it or to build it. You can check the MD5sum |
19 |
then. |
20 |
|
21 |
Note that xdelta patches in fact include the MD5 checksum of the |
22 |
output file, so checking it is a bit redundant. |
23 |
|
24 |
> >.zip, .rpm, or self-extracting .exe files can also be uncompressed and |
25 |
> >diffd, at least in principle. |
26 |
> Summing it up, if we can pull it apart and get the uncompressed data, |
27 |
> we md5 that data. If we can't, well I've yet to see any diff prog |
28 |
> (aside from xdelta's lackluster gzip support) that even does |
29 |
> decompression of data, so it's a non-issue for the moment... |
30 |
|
31 |
Yes, if we can decompress it then we do. Otherwise we just do the |
32 |
xdelta across the whole file. In either case, if the delta is |
33 |
ridiculously large, then we discard it. |
34 |
|
35 |
> I'd agree. My understanding for why the deltup format, from what I've |
36 |
> gathered trolling the forums, jjw's attempting to build his own |
37 |
> differencing/encoding setup which is a fair amount of work speaking |
38 |
> from experience. |
39 |
|
40 |
I think the right thing is to use the VCDIFF format, which allows |
41 |
standard expression of deltas regardless of the algorithm that |
42 |
generates them. I understand that xdelta is moving towards this and |
43 |
librsync will too eventually. |
44 |
|
45 |
> A side note for doing gentoo delta patching is that (imo) it ought |
46 |
> to in some form provide for standard diff's since any version |
47 |
> patches that are distributed currently are typically diff (look at |
48 |
> the kernel for instance). |
49 |
|
50 |
That would be OK, but I'm actually inclined to think that it would be |
51 |
better to recode diffs into xdeltas. xdeltas are often 5-10x smaller |
52 |
than a compressed diff, because they don't include redundant context. |
53 |
|
54 |
diffs are great for humans or for fuzzy merges. As a |
55 |
delta-compression mechanism they're pretty lame. |
56 |
|
57 |
-- |
58 |
Martin |