Gentoo Archives: gentoo-dev

From: Martin Pool <mbp@×××××.org>
To: gentoo-dev@g.o
Subject: Re: [gentoo-dev] Re: proposed md5sum change
Date: Mon, 23 Jun 2003 04:44:03
Message-Id: 20030623044300.GA20153@vexed.ozlabs.hp.com
In Reply to: Re: [gentoo-dev] Re: proposed md5sum change by bdharring
1 On 22 Jun 2003, bdharring <bdharring@××××.edu> wrote:
2
3 > >The uncompressed form is the natural and efficient place to do delta
4 > >compression.
5
6 > Agreed, although I would posit that decompressing a large bzip2 for
7 > md5suming in memory makes it a substantially longer affair then if
8 > you just md5'd the compressed tarball. On my personal system,
9 > compressed=>3-5s, bzip2 decompressing piped to md5 = 1-2 minutes.
10 > More below...
11
12 Yes, if the user is downloading a compressed form, then it makes sense
13 to calculate the hash of the compressed form when checking if
14 e.g. they got an interrupted or corrupt download.
15
16 But aside from that, including the time to decompress as a cost of
17 checking the MD5 sum is a furphy. It has to be decompressed at some
18 point whether to patch it or to build it. You can check the MD5sum
19 then.
20
21 Note that xdelta patches in fact include the MD5 checksum of the
22 output file, so checking it is a bit redundant.
23
24 > >.zip, .rpm, or self-extracting .exe files can also be uncompressed and
25 > >diffd, at least in principle.
26 > Summing it up, if we can pull it apart and get the uncompressed data,
27 > we md5 that data. If we can't, well I've yet to see any diff prog
28 > (aside from xdelta's lackluster gzip support) that even does
29 > decompression of data, so it's a non-issue for the moment...
30
31 Yes, if we can decompress it then we do. Otherwise we just do the
32 xdelta across the whole file. In either case, if the delta is
33 ridiculously large, then we discard it.
34
35 > I'd agree. My understanding for why the deltup format, from what I've
36 > gathered trolling the forums, jjw's attempting to build his own
37 > differencing/encoding setup which is a fair amount of work speaking
38 > from experience.
39
40 I think the right thing is to use the VCDIFF format, which allows
41 standard expression of deltas regardless of the algorithm that
42 generates them. I understand that xdelta is moving towards this and
43 librsync will too eventually.
44
45 > A side note for doing gentoo delta patching is that (imo) it ought
46 > to in some form provide for standard diff's since any version
47 > patches that are distributed currently are typically diff (look at
48 > the kernel for instance).
49
50 That would be OK, but I'm actually inclined to think that it would be
51 better to recode diffs into xdeltas. xdeltas are often 5-10x smaller
52 than a compressed diff, because they don't include redundant context.
53
54 diffs are great for humans or for fuzzy merges. As a
55 delta-compression mechanism they're pretty lame.
56
57 --
58 Martin