1 |
On Wed, 11 Jun 2003 11:02:02 -0500, Brian Harring wrote: |
2 |
|
3 |
> Hola all, |
4 |
> Straight to the point, I propose instead of md5summing the compressed |
5 |
> distfile, we md5sum the actual data, the tarball. |
6 |
|
7 |
Speaking as somebody who has worked on rsync and librsync: I agree, I |
8 |
think that would be an big improvement. |
9 |
|
10 |
The uncompressed form is the natural and efficient place to do delta |
11 |
compression. |
12 |
|
13 |
This implies that the client, after applying a patch, ends up with an |
14 |
uncompressed (e.g. .tar) file. Making the client recompress it is |
15 |
wasteful, because compression is expensive and in any case it's just |
16 |
going to be uncompressed and extracted. |
17 |
|
18 |
Not only is it wasteful, but it's hard to do correctly. As other |
19 |
people have noted, compression is not very reproducible. |
20 |
|
21 |
This implies that the script which unpacks and builds the source needs |
22 |
to be able to accept the unpacked form rather than the packed form as |
23 |
at present. That doesn't sound terribly hard. |
24 |
|
25 |
Some people might want to store packages in compressed form because |
26 |
they're low on disk, and so might want to bzip them up again after |
27 |
applying the patch. On the other hand, some people might want to |
28 |
keep them uncompressed because their CPU is slow. On the third hand, |
29 |
some people might want to *recompress* everything into bz2 even if it |
30 |
was originally .gz. Any of these can be supported through some future |
31 |
mechanism; they don't need to determine the download system. |
32 |
|
33 |
Seemant Kuleen wrote: |
34 |
|
35 |
> Now, the promised concern bit. Unfortunately, while the majority of the |
36 |
> packages do come in a compressed tarball format, there are many (enough to |
37 |
> make it a corner case of some concern) packages which do not. Off the top |
38 |
> of my head, I can think of .Z (forget which package), .rpm |
39 |
> (redhat-artwork), .bin (realplayer). And in some cases, we just get an |
40 |
> uncompressed README file in the SRC_URI (or the wacom.c file in xfree, |
41 |
> though I'm not certain of it right this moment). |
42 |
|
43 |
.Z files can be uncompressed and handled as for gzip (I think gzip |
44 |
handles them in fact.) |
45 |
|
46 |
.zip, .rpm, or self-extracting .exe files can also be uncompressed and |
47 |
diffd, at least in principle. |
48 |
|
49 |
Uncompressed READMEs, patches or .c files are just too easy. :-) |
50 |
|
51 |
If you don't recognize the format, you can try to do a delta on the |
52 |
binary form. If the delta is too big, drop it. |
53 |
|
54 |
Experience on Debian has shown that compiled binaries in general do |
55 |
not delta-compress very well, so I think not being able to uncompress |
56 |
them is not a terrible thing. |
57 |
|
58 |
The point: |
59 |
|
60 |
Gentoo should distribute the md5sums for both the compressed and |
61 |
uncompressed forms of packages. They are checked in that order; |
62 |
either is sufficient. |
63 |
|
64 |
Regular non-delta downloads will proceed as usual, and the md5sum can |
65 |
be checked immediately after download. There is no added cost. |
66 |
|
67 |
Patch downloads can be done by |
68 |
|
69 |
- download xdelta |
70 |
- uncompress old file, pipe it into 'xdelta patch', store the result |
71 |
- check result against uncompressed MD5sum |
72 |
|
73 |
As far as I can see this removes any need for a special deltup file |
74 |
format. Just simply send xdeltas. |
75 |
|
76 |
A great advantage is that xdeltas are useful to people other than |
77 |
Gentoo, so people upstream or mirrors may be more willing to |
78 |
distribute them alongside the original source. |
79 |
|
80 |
Much as I love the idea of deltup, I think the current code is a bit |
81 |
messy and making up a new format is unjustified. |
82 |
|
83 |
> In terms of performance of the md5summing, it would still likely be i/o |
84 |
> limited- decompression would be done in memory after all. |
85 |
|
86 |
The approach above is much *more* efficient than deltup, which makes |
87 |
an extra roundtrip to bz2 format. |
88 |
|
89 |
What have I missed? |
90 |
|
91 |
-- |
92 |
Martin |
93 |
|
94 |
If you don't know how to code, then you don't know how to design the |
95 |
software either. Period. You can only cause trouble. |
96 |
-- Havoc Pennington, http://ometer.com/hacking.html |
97 |
|
98 |
-- |
99 |
gentoo-dev@g.o mailing list |