1 |
> I still don't like the UMD5 followed by a duplicate MD5. |
2 |
The duplicate md5 is redundant, and isn't needed- the UMD5 (uncompressed |
3 |
md5) info would still be required though. |
4 |
Since The new file's MD5/size can either be pulled from the normal |
5 |
digests, or from the patch list (eg, that new file will be used as the |
6 |
base for another patch). I should've caught that :) |
7 |
|
8 |
I'll update the glep to remove that redundancy from the proposal. |
9 |
|
10 |
> I don't particularly |
11 |
> like the MD5 database either. How about adding UMD5 to the main tree's |
12 |
> individual digests? |
13 |
I'd thought about it originally, but the only time the uncompressed md5 |
14 |
value/size is useful is when patching is taking place- if the glep gets |
15 |
off the ground, and patches are common place, it definitely would make |
16 |
sense. |
17 |
|
18 |
The original reason for keeping the UMD5 out of the current digest files |
19 |
was backwards compatability- from what I gathered from carpaski, |
20 |
ancient versions of portage would have issues w/ the addition of a new |
21 |
signature to the digests. The specific version number I can dig up for |
22 |
those interested. |
23 |
|
24 |
> If the MD5 doesn't match, uncompress the file and compare |
25 |
> against the UMD5 if one is available. |
26 |
If the md5 DB is completely and utterly shot down, that would be the |
27 |
remaining option- unfortunately it induces a fair amount of overhead. |
28 |
With an alternate MD5 db, we do one md5 run of the data. W/out it, we'd |
29 |
have to first do a md5 run of the data, and then pull the md5 of the |
30 |
uncompressed file if no match found- for small files (sub 1mb), the two |
31 |
options wouldn't be a huge difference for users. |
32 |
|
33 |
For the majority of the larger files (20-50mb range), it would be quite |
34 |
noticable. That's also assuming that the user has a fast proc- if they |
35 |
don't, the extra cycles required to decompress and calculate would make |
36 |
become painful very quickly. |
37 |
|
38 |
Quick stats from my xp1700 system- |
39 |
|
40 |
time md5sum linux-2.6.4.tar.bz2 |
41 |
real 0m0.301s |
42 |
user 0m0.215s |
43 |
sys 0m0.082s |
44 |
|
45 |
time bzip2 -dc linux-2.6.4.tar.bz2 | md5sum |
46 |
real 0m35.726s |
47 |
user 0m34.495s |
48 |
sys 0m0.870s |
49 |
|
50 |
ls -l linux-2.6.4.tar.bz2 ~== 34,386,912 |
51 |
|
52 |
Note that with out the alternate md5 db, the user would incur the cost |
53 |
of both operations. With bz2 files I don't expect the |
54 |
reconstructed/recompressed md5 to differ from the tree's digest value |
55 |
all that often- for gzip files, I would expect it to happen quite often. |
56 |
|
57 |
~brian |