Gentoo Archives: gentoo-dev

From: Paul de Vrieze <pauldv@g.o>
To: gentoo-dev@g.o
Subject: Re: [gentoo-dev] proposed md5sum change
Date: Thu, 12 Jun 2003 08:15:57
Message-Id: 200306121015.55713.pauldv@gentoo.org
In Reply to: Re: [gentoo-dev] proposed md5sum change by Evan Powers
1 On Thursday 12 June 2003 09:53, Evan Powers wrote:
2 > On Thursday 12 June 2003 02:56 am, Seemant Kulleen wrote:
3 > > Anyway, the current approach keeps it simple in that the md5sum is off
4 > > the *item(s) that is/are downloaded*. The first reason I can see is what
5 > > I stated above. There are other reasons I can see as well. You know,
6 > > immediately upon fetching the set of source items that they are bad. So,
7 > > no disk i/o or cpu cycles are spent in the unpacking; and no potentially
8 > > nasty code is even untarred on the system, yet.
9 >
10 > Well, I can think of a way to address this part of the issue, anyway.
11 >
12 > * The portage tree still has MD5 digests for the item(s) which is/are
13 > downloaded.
14 >
15 This would make things easy for most people, and would save also lots of cpu
16 cycles.
17
18 > * After downloading, emerge executes a (user specified?) program on the
19 > newly downloaded file. This program applies some transform to the file;
20 > maybe it decompresses whatever format the file is in and re-compresses it
21 > with bzip2, or maybe it only format-shifts files which are over a certain
22 > size threshold, whatever.
23 >
24 Allways fun, maybe we would also have a list of allowed/blacklisted extensions
25 like RPM files which are allready compressed. (Or openoffice files/jar-balls
26 which loose validity if compressed in a different format, but could be
27 recompressed in zip format)
28
29 > * Next, emerge adds a new record to a database (text, one record per line,
30 > for example) somewhere in /var. This database has the original name of the
31 > downloaded file, the original MD5 digest, the new name, and the new MD5
32 > digest.
33 >
34 Maybe extending the serverside digest to also include an unpacked digest
35 (where applicable) would be smarter in validation of patch based and oddball
36 cases. (This doesn't mean that client-side digesting isn't useful, it is for
37 not having to unpack first).
38
39 > With infrastructure like this you could even add more interesting
40 > functionality to portage pretty easily. Like maybe the transform program
41 > uploads the file to the corporate internal FTP mirror, and the database
42 > maps the original name to the URI which locates it.
43 >
44 > Or, if emerge exported sufficient context to the transform program, you
45 > could fix the case where a particular braindead package is available only
46 > as package.tar.gz, not package-version.tar.gz. The transform program would
47 > add the version to the filename, and the database would be allowed to have
48 > multiple entries for each original file name (provided they had different
49 > MD5 digests). Then emerge would just pick the record with both the desired
50 > original name and the desired original MD5 digest.
51 >
52
53 I do not know whether package (name) transformation is the same thing, but I
54 support it fully. I believe that name transformation should also be specified
55 in the ebuild. That would lead to a variable like:
56
57 TRANSFORM="foo.tgz:foo-0.0.1.tar.gz
58 bar-0.1Beta1.2.tar.bz2:bar-0.1_beta102.tar.bz2"
59
60 Which would automatically transform filenames. This would also apply to the
61 mirrors, so portage needs to be changed to try to fetch the transformed name
62 from the mirrors while trying to fetch the original name from the source. I
63 do believe though that name transformation is a separate issue.
64
65 Paul
66
67 --
68 Paul de Vrieze
69 Researcher
70 Mail: pauldv@××××××.nl
71 Homepage: http://www.cs.kun.nl/~pauldv

Replies

Subject Author
Re: [gentoo-dev] proposed md5sum change Brian Harring <bdharring@××××.edu>