Gentoo Archives: gentoo-dev

From: Martin Pool <mbp@×××××.org>
To: gentoo-dev@g.o
Subject: [gentoo-dev] Re: proposed md5sum change
Date: Mon, 23 Jun 2003 02:41:58
Message-Id: pan.2003.06.23.02.06.32.327923@sourcefrog.net
1 On Wed, 11 Jun 2003 11:02:02 -0500, Brian Harring wrote:
2
3 > Hola all,
4 > Straight to the point, I propose instead of md5summing the compressed
5 > distfile, we md5sum the actual data, the tarball.
6
7 Speaking as somebody who has worked on rsync and librsync: I agree, I
8 think that would be an big improvement.
9
10 The uncompressed form is the natural and efficient place to do delta
11 compression.
12
13 This implies that the client, after applying a patch, ends up with an
14 uncompressed (e.g. .tar) file. Making the client recompress it is
15 wasteful, because compression is expensive and in any case it's just
16 going to be uncompressed and extracted.
17
18 Not only is it wasteful, but it's hard to do correctly. As other
19 people have noted, compression is not very reproducible.
20
21 This implies that the script which unpacks and builds the source needs
22 to be able to accept the unpacked form rather than the packed form as
23 at present. That doesn't sound terribly hard.
24
25 Some people might want to store packages in compressed form because
26 they're low on disk, and so might want to bzip them up again after
27 applying the patch. On the other hand, some people might want to
28 keep them uncompressed because their CPU is slow. On the third hand,
29 some people might want to *recompress* everything into bz2 even if it
30 was originally .gz. Any of these can be supported through some future
31 mechanism; they don't need to determine the download system.
32
33 Seemant Kuleen wrote:
34
35 > Now, the promised concern bit. Unfortunately, while the majority of the
36 > packages do come in a compressed tarball format, there are many (enough to
37 > make it a corner case of some concern) packages which do not. Off the top
38 > of my head, I can think of .Z (forget which package), .rpm
39 > (redhat-artwork), .bin (realplayer). And in some cases, we just get an
40 > uncompressed README file in the SRC_URI (or the wacom.c file in xfree,
41 > though I'm not certain of it right this moment).
42
43 .Z files can be uncompressed and handled as for gzip (I think gzip
44 handles them in fact.)
45
46 .zip, .rpm, or self-extracting .exe files can also be uncompressed and
47 diffd, at least in principle.
48
49 Uncompressed READMEs, patches or .c files are just too easy. :-)
50
51 If you don't recognize the format, you can try to do a delta on the
52 binary form. If the delta is too big, drop it.
53
54 Experience on Debian has shown that compiled binaries in general do
55 not delta-compress very well, so I think not being able to uncompress
56 them is not a terrible thing.
57
58 The point:
59
60 Gentoo should distribute the md5sums for both the compressed and
61 uncompressed forms of packages. They are checked in that order;
62 either is sufficient.
63
64 Regular non-delta downloads will proceed as usual, and the md5sum can
65 be checked immediately after download. There is no added cost.
66
67 Patch downloads can be done by
68
69 - download xdelta
70 - uncompress old file, pipe it into 'xdelta patch', store the result
71 - check result against uncompressed MD5sum
72
73 As far as I can see this removes any need for a special deltup file
74 format. Just simply send xdeltas.
75
76 A great advantage is that xdeltas are useful to people other than
77 Gentoo, so people upstream or mirrors may be more willing to
78 distribute them alongside the original source.
79
80 Much as I love the idea of deltup, I think the current code is a bit
81 messy and making up a new format is unjustified.
82
83 > In terms of performance of the md5summing, it would still likely be i/o
84 > limited- decompression would be done in memory after all.
85
86 The approach above is much *more* efficient than deltup, which makes
87 an extra roundtrip to bz2 format.
88
89 What have I missed?
90
91 --
92 Martin
93
94 If you don't know how to code, then you don't know how to design the
95 software either. Period. You can only cause trouble.
96 -- Havoc Pennington, http://ometer.com/hacking.html
97
98 --
99 gentoo-dev@g.o mailing list

Replies

Subject Author
Re: [gentoo-dev] Re: proposed md5sum change bdharring <bdharring@××××.edu>