Gentoo Archives: gentoo-dev

From: Brian Harring <ferringb@g.o>
To: gentoo-dev@l.g.o
Subject: Re: [gentoo-dev] RFC: Glep25, Distfile Patching Support
Date: Sun, 11 Apr 2004 05:28:46
Message-Id: 1081661217.3772.48.camel@exodus
In Reply to: Re: [gentoo-dev] RFC: Glep25, Distfile Patching Support by Jason Stubbs
1 > I still don't like the UMD5 followed by a duplicate MD5.
2 The duplicate md5 is redundant, and isn't needed- the UMD5 (uncompressed
3 md5) info would still be required though.
4 Since The new file's MD5/size can either be pulled from the normal
5 digests, or from the patch list (eg, that new file will be used as the
6 base for another patch). I should've caught that :)
7
8 I'll update the glep to remove that redundancy from the proposal.
9
10 > I don't particularly
11 > like the MD5 database either. How about adding UMD5 to the main tree's
12 > individual digests?
13 I'd thought about it originally, but the only time the uncompressed md5
14 value/size is useful is when patching is taking place- if the glep gets
15 off the ground, and patches are common place, it definitely would make
16 sense.
17
18 The original reason for keeping the UMD5 out of the current digest files
19 was backwards compatability- from what I gathered from carpaski,
20 ancient versions of portage would have issues w/ the addition of a new
21 signature to the digests. The specific version number I can dig up for
22 those interested.
23
24 > If the MD5 doesn't match, uncompress the file and compare
25 > against the UMD5 if one is available.
26 If the md5 DB is completely and utterly shot down, that would be the
27 remaining option- unfortunately it induces a fair amount of overhead.
28 With an alternate MD5 db, we do one md5 run of the data. W/out it, we'd
29 have to first do a md5 run of the data, and then pull the md5 of the
30 uncompressed file if no match found- for small files (sub 1mb), the two
31 options wouldn't be a huge difference for users.
32
33 For the majority of the larger files (20-50mb range), it would be quite
34 noticable. That's also assuming that the user has a fast proc- if they
35 don't, the extra cycles required to decompress and calculate would make
36 become painful very quickly.
37
38 Quick stats from my xp1700 system-
39
40 time md5sum linux-2.6.4.tar.bz2
41 real 0m0.301s
42 user 0m0.215s
43 sys 0m0.082s
44
45 time bzip2 -dc linux-2.6.4.tar.bz2 | md5sum
46 real 0m35.726s
47 user 0m34.495s
48 sys 0m0.870s
49
50 ls -l linux-2.6.4.tar.bz2 ~== 34,386,912
51
52 Note that with out the alternate md5 db, the user would incur the cost
53 of both operations. With bz2 files I don't expect the
54 reconstructed/recompressed md5 to differ from the tree's digest value
55 all that often- for gzip files, I would expect it to happen quite often.
56
57 ~brian

Attachments

File name MIME type
signature.asc application/pgp-signature

Replies

Subject Author
Re: [gentoo-dev] RFC: Glep25, Distfile Patching Support Brian Harring <ferringb@g.o>