Gentoo Archives: gentoo-dev

From: Brian Harring <bdharring@××××.edu>
To: gentoo-dev <gentoo-dev@g.o>
Cc: Paul de Vrieze <pauldv@g.o>
Subject: Re: [gentoo-dev] proposed md5sum change
Date: Thu, 12 Jun 2003 16:32:09
Message-Id: 1055435493.23370.72.camel@tylendel.genetics.wisc.edu
In Reply to: Re: [gentoo-dev] proposed md5sum change by Paul de Vrieze
1 Replies are below...
2
3 On Thu, 2003-06-12 at 03:15, Paul de Vrieze wrote:
4 > On Thursday 12 June 2003 09:53, Evan Powers wrote:
5 > > On Thursday 12 June 2003 02:56 am, Seemant Kulleen wrote:
6 > > > Anyway, the current approach keeps it simple in that the md5sum is off
7 > > > the *item(s) that is/are downloaded*. The first reason I can see is what
8 > > > I stated above. There are other reasons I can see as well. You know,
9 > > > immediately upon fetching the set of source items that they are bad. So,
10 > > > no disk i/o or cpu cycles are spent in the unpacking; and no potentially
11 > > > nasty code is even untarred on the system, yet.
12 > >
13 > > Well, I can think of a way to address this part of the issue, anyway.
14 > >
15 > > * The portage tree still has MD5 digests for the item(s) which is/are
16 > > downloaded.
17 > >
18 > This would make things easy for most people, and would save also lots of cpu
19 > cycles.
20 Agreed. Ignoring the cpu cost of decompression, gzip decompressing and
21 md5ing is fairly close to i/o at least for my machine, which is an
22 xp1700 w/ about 40mb/s throughput on my hd. Bzip2 (as jjw pointed out)
23 is an entirely different beast, decompressing it is not close to i/o
24 speeds. At this stage of the game, recompressed/patched tarballs would
25 be the minority- down the line assuming diffing/patching takes off, this
26 might be something to think about.
27
28 > > * After downloading, emerge executes a (user specified?) program on the
29 > > newly downloaded file. This program applies some transform to the file;
30 > > maybe it decompresses whatever format the file is in and re-compresses it
31 > > with bzip2, or maybe it only format-shifts files which are over a certain
32 > > size threshold, whatever.
33 > >
34 > Allways fun, maybe we would also have a list of allowed/blacklisted extensions
35 > like RPM files which are allready compressed. (Or openoffice files/jar-balls
36 > which loose validity if compressed in a different format, but could be
37 > recompressed in zip format)
38 >
39 > > * Next, emerge adds a new record to a database (text, one record per line,
40 > > for example) somewhere in /var. This database has the original name of the
41 > > downloaded file, the original MD5 digest, the new name, and the new MD5
42 > > digest.
43 > >
44 > Maybe extending the serverside digest to also include an unpacked digest
45 > (where applicable) would be smarter in validation of patch based and oddball
46 > cases. (This doesn't mean that client-side digesting isn't useful, it is for
47 > not having to unpack first).
48 How about this, and mind you this is just for dealing w/ md5sum's-
49 instead of doing any db-style stuff, just create a file along side (w/in
50 the distfile dir most likely) that contains the uncompressed data's
51 md5sum. If you go about creating a db type setup, you're going to run
52 into major issues in an environment where the distfiles dir is shared
53 out to other systems since you're not going to be sharing the db.
54 Basically I could see this- a simple script that a user can use to
55 convert non-gz distfiles to bzip2 tarballs which creates the file (think
56 linux-2.4.19.tar.bz2 and linux-2.4.19.tar.md5)... for distfile
57 diffing/patching, it uses the same method. If the reconstructed and
58 recompressed verion's md5 matches what portage has, hoozah, no need to
59 create the file- if not (say upgrading openoffice), we create the file.
60 Also, it dawned on me that md5summing the data has an added bonus of
61 being indifferent to the patching/differencing method. In other words,
62 we could use the standard unified diff's that are provided for the
63 kernel versions for instance.
64
65 As for transforming, would it really be needed? I've spent a bit of
66 time rooting through the distfile dir and I don't recall seeing
67 non-versioned names, although as always, I could be wrong.
68
69
70 --
71 gentoo-dev@g.o mailing list

Replies

Subject Author
Re: [gentoo-dev] proposed md5sum change Paul de Vrieze <pauldv@g.o>