1 |
Responses/cheer-leading littered liberally below... |
2 |
|
3 |
On Sunday, June 22, 2003, at 09:41 PM, Martin Pool wrote: |
4 |
|
5 |
> On Wed, 11 Jun 2003 11:02:02 -0500, Brian Harring wrote: |
6 |
> |
7 |
>> Hola all, |
8 |
>> Straight to the point, I propose instead of md5summing the compressed |
9 |
>> distfile, we md5sum the actual data, the tarball. |
10 |
> |
11 |
> Speaking as somebody who has worked on rsync and librsync: I agree, I |
12 |
> think that would be an big improvement. |
13 |
Heh, small world. I'd actually read of the original complaint of it I |
14 |
in tridgell's master thesis while researching delta compression for my |
15 |
own little prog... |
16 |
|
17 |
> The uncompressed form is the natural and efficient place to do delta |
18 |
> compression. |
19 |
Agreed, although I would posit that decompressing a large bzip2 for |
20 |
md5suming in memory makes it a substantially longer affair then if you |
21 |
just md5'd the compressed tarball. On my personal system, |
22 |
compressed=>3-5s, bzip2 decompressing piped to md5 = 1-2 minutes. More |
23 |
below... |
24 |
> Seemant Kuleen wrote: |
25 |
> |
26 |
>> Now, the promised concern bit. Unfortunately, while the majority of |
27 |
>> the |
28 |
>> packages do come in a compressed tarball format, there are many |
29 |
>> (enough to |
30 |
>> make it a corner case of some concern) packages which do not. Off |
31 |
>> the top |
32 |
>> of my head, I can think of .Z (forget which package), .rpm |
33 |
>> (redhat-artwork), .bin (realplayer). And in some cases, we just get |
34 |
>> an |
35 |
>> uncompressed README file in the SRC_URI (or the wacom.c file in xfree, |
36 |
>> though I'm not certain of it right this moment). |
37 |
> |
38 |
> .Z files can be uncompressed and handled as for gzip (I think gzip |
39 |
> handles them in fact.) |
40 |
> |
41 |
> .zip, .rpm, or self-extracting .exe files can also be uncompressed and |
42 |
> diffd, at least in principle. |
43 |
Summing it up, if we can pull it apart and get the uncompressed data, |
44 |
we md5 that data. If we can't, well I've yet to see any diff prog |
45 |
(aside from xdelta's lackluster gzip support) that even does |
46 |
decompression of data, so it's a non-issue for the moment... |
47 |
> |
48 |
> Experience on Debian has shown that compiled binaries in general do |
49 |
> not delta-compress very well, so I think not being able to uncompress |
50 |
> them is not a terrible thing. |
51 |
Horribly badly actually. Problem being of course that you change |
52 |
offset x, everything after x is different... tiz the reason I was |
53 |
looking at md5ing the data, since to get any decent delta compression |
54 |
you have to decompress... but you likely know that so I'll shut up now. |
55 |
> |
56 |
> The point: |
57 |
> |
58 |
> Gentoo should distribute the md5sums for both the compressed and |
59 |
> uncompressed forms of packages. They are checked in that order; |
60 |
> either is sufficient. |
61 |
That would solve the initial complaint I had mentioned about speed |
62 |
above. I like it, and it's a general solution allowing the user more |
63 |
control over how their distfiles are stored (aside from making delta |
64 |
compression much easier to do). |
65 |
> |
66 |
> Regular non-delta downloads will proceed as usual, and the md5sum can |
67 |
> be checked immediately after download. There is no added cost. |
68 |
> |
69 |
> Patch downloads can be done by |
70 |
> |
71 |
> - download xdelta |
72 |
> - uncompress old file, pipe it into 'xdelta patch', store the result |
73 |
> - check result against uncompressed MD5sum |
74 |
> |
75 |
> As far as I can see this removes any need for a special deltup file |
76 |
> format. Just simply send xdeltas. |
77 |
I'd agree. My understanding for why the deltup format, from what I've |
78 |
gathered trolling the forums, jjw's attempting to build his own |
79 |
differencing/encoding setup which is a fair amount of work speaking |
80 |
from experience. A side note for doing gentoo delta patching is that |
81 |
(imo) it ought to in some form provide for standard diff's since any |
82 |
version patches that are distributed currently are typically diff (look |
83 |
at the kernel for instance). |
84 |
Either way, back to adult swim... |
85 |
~brian |
86 |
|
87 |
|
88 |
-- |
89 |
gentoo-dev@g.o mailing list |