Gentoo Archives: gentoo-dev

From: bdharring <bdharring@××××.edu>
To: gentoo-dev@g.o, mbp@×××××.org
Subject: Re: [gentoo-dev] Re: proposed md5sum change
Date: Mon, 23 Jun 2003 04:00:26
Message-Id: 3AE46E76-A52F-11D7-8DBC-00306580AC5C@wisc.edu
In Reply to: [gentoo-dev] Re: proposed md5sum change by Martin Pool
1 Responses/cheer-leading littered liberally below...
2
3 On Sunday, June 22, 2003, at 09:41 PM, Martin Pool wrote:
4
5 > On Wed, 11 Jun 2003 11:02:02 -0500, Brian Harring wrote:
6 >
7 >> Hola all,
8 >> Straight to the point, I propose instead of md5summing the compressed
9 >> distfile, we md5sum the actual data, the tarball.
10 >
11 > Speaking as somebody who has worked on rsync and librsync: I agree, I
12 > think that would be an big improvement.
13 Heh, small world. I'd actually read of the original complaint of it I
14 in tridgell's master thesis while researching delta compression for my
15 own little prog...
16
17 > The uncompressed form is the natural and efficient place to do delta
18 > compression.
19 Agreed, although I would posit that decompressing a large bzip2 for
20 md5suming in memory makes it a substantially longer affair then if you
21 just md5'd the compressed tarball. On my personal system,
22 compressed=>3-5s, bzip2 decompressing piped to md5 = 1-2 minutes. More
23 below...
24 > Seemant Kuleen wrote:
25 >
26 >> Now, the promised concern bit. Unfortunately, while the majority of
27 >> the
28 >> packages do come in a compressed tarball format, there are many
29 >> (enough to
30 >> make it a corner case of some concern) packages which do not. Off
31 >> the top
32 >> of my head, I can think of .Z (forget which package), .rpm
33 >> (redhat-artwork), .bin (realplayer). And in some cases, we just get
34 >> an
35 >> uncompressed README file in the SRC_URI (or the wacom.c file in xfree,
36 >> though I'm not certain of it right this moment).
37 >
38 > .Z files can be uncompressed and handled as for gzip (I think gzip
39 > handles them in fact.)
40 >
41 > .zip, .rpm, or self-extracting .exe files can also be uncompressed and
42 > diffd, at least in principle.
43 Summing it up, if we can pull it apart and get the uncompressed data,
44 we md5 that data. If we can't, well I've yet to see any diff prog
45 (aside from xdelta's lackluster gzip support) that even does
46 decompression of data, so it's a non-issue for the moment...
47 >
48 > Experience on Debian has shown that compiled binaries in general do
49 > not delta-compress very well, so I think not being able to uncompress
50 > them is not a terrible thing.
51 Horribly badly actually. Problem being of course that you change
52 offset x, everything after x is different... tiz the reason I was
53 looking at md5ing the data, since to get any decent delta compression
54 you have to decompress... but you likely know that so I'll shut up now.
55 >
56 > The point:
57 >
58 > Gentoo should distribute the md5sums for both the compressed and
59 > uncompressed forms of packages. They are checked in that order;
60 > either is sufficient.
61 That would solve the initial complaint I had mentioned about speed
62 above. I like it, and it's a general solution allowing the user more
63 control over how their distfiles are stored (aside from making delta
64 compression much easier to do).
65 >
66 > Regular non-delta downloads will proceed as usual, and the md5sum can
67 > be checked immediately after download. There is no added cost.
68 >
69 > Patch downloads can be done by
70 >
71 > - download xdelta
72 > - uncompress old file, pipe it into 'xdelta patch', store the result
73 > - check result against uncompressed MD5sum
74 >
75 > As far as I can see this removes any need for a special deltup file
76 > format. Just simply send xdeltas.
77 I'd agree. My understanding for why the deltup format, from what I've
78 gathered trolling the forums, jjw's attempting to build his own
79 differencing/encoding setup which is a fair amount of work speaking
80 from experience. A side note for doing gentoo delta patching is that
81 (imo) it ought to in some form provide for standard diff's since any
82 version patches that are distributed currently are typically diff (look
83 at the kernel for instance).
84 Either way, back to adult swim...
85 ~brian
86
87
88 --
89 gentoo-dev@g.o mailing list

Replies

Subject Author
Re: [gentoo-dev] Re: proposed md5sum change Martin Pool <mbp@×××××.org>