Gentoo Archives: gentoo-dev

From:	Brian Harring <ferringb@g.o>
To:	gentoo-dev@l.g.o
Subject:	Re: [gentoo-dev] RFC: Glep25, Distfile Patching Support
Date:	Sun, 11 Apr 2004 05:28:46
Message-Id:	`1081661217.3772.48.camel@exodus`
In Reply to:	Re: [gentoo-dev] RFC: Glep25, Distfile Patching Support by Jason Stubbs

1	> I still don't like the UMD5 followed by a duplicate MD5.
2	The duplicate md5 is redundant, and isn't needed- the UMD5 (uncompressed
3	md5) info would still be required though.
4	Since The new file's MD5/size can either be pulled from the normal
5	digests, or from the patch list (eg, that new file will be used as the
6	base for another patch). I should've caught that :)
7
8	I'll update the glep to remove that redundancy from the proposal.
9
10	> I don't particularly
11	> like the MD5 database either. How about adding UMD5 to the main tree's
12	> individual digests?
13	I'd thought about it originally, but the only time the uncompressed md5
14	value/size is useful is when patching is taking place- if the glep gets
15	off the ground, and patches are common place, it definitely would make
16	sense.
17
18	The original reason for keeping the UMD5 out of the current digest files
19	was backwards compatability- from what I gathered from carpaski,
20	ancient versions of portage would have issues w/ the addition of a new
21	signature to the digests. The specific version number I can dig up for
22	those interested.
23
24	> If the MD5 doesn't match, uncompress the file and compare
25	> against the UMD5 if one is available.
26	If the md5 DB is completely and utterly shot down, that would be the
27	remaining option- unfortunately it induces a fair amount of overhead.
28	With an alternate MD5 db, we do one md5 run of the data. W/out it, we'd
29	have to first do a md5 run of the data, and then pull the md5 of the
30	uncompressed file if no match found- for small files (sub 1mb), the two
31	options wouldn't be a huge difference for users.
32
33	For the majority of the larger files (20-50mb range), it would be quite
34	noticable. That's also assuming that the user has a fast proc- if they
35	don't, the extra cycles required to decompress and calculate would make
36	become painful very quickly.
37
38	Quick stats from my xp1700 system-
39
40	time md5sum linux-2.6.4.tar.bz2
41	real 0m0.301s
42	user 0m0.215s
43	sys 0m0.082s
44
45	time bzip2 -dc linux-2.6.4.tar.bz2 \| md5sum
46	real 0m35.726s
47	user 0m34.495s
48	sys 0m0.870s
49
50	ls -l linux-2.6.4.tar.bz2 ~== 34,386,912
51
52	Note that with out the alternate md5 db, the user would incur the cost
53	of both operations. With bz2 files I don't expect the
54	reconstructed/recompressed md5 to differ from the tree's digest value
55	all that often- for gzip files, I would expect it to happen quite often.
56
57	~brian

Attachments

File name	MIME type
signature.asc	application/pgp-signature

Replies

Subject	Author
Re: [gentoo-dev] RFC: Glep25, Distfile Patching Support	Brian Harring <ferringb@g.o>

Report Message

Find on MARC Find on Google Groups