Gentoo Archives: gentoo-dev

From: Marius Mauch <genone@g.o>
To: gentoo-dev@l.g.o
Subject: [gentoo-dev] digest reorganization and enhancements
Date: Fri, 08 Oct 2004 16:43:33
Message-Id: 20041008184325.316fc227@andy.genone.homeip.net
1 Hi,
2
3 This mail was first sent to dev-portage so it's written for that
4 audience, but it should be understandable for normal devs as well ;)
5
6 Short summary: current portage versions won't be able to handle any
7 modification to the digest format so we have to find a different way if
8 we want support for SHA1 or other algorithms.
9
10
11 And now the more detailed mail:
12
13 As was discussed again on -dev recently we need more digest algorithms
14 for file verification. One way that would be halfway compatible would be
15 to add additional lines use the same syntax as for the current md5
16 checksums to the digests and Manifests. However that means a lot of
17 redundancy as for each additional algorithm the filename and filesize
18 would be duplicated. It's also not trivial to do as there are several
19 functions dealing with digests and they all parse them a bit different
20 (I tried to add SHA1 support for digests and Manifests, took me about an
21 hour before I gave up). Also as soon as we add non-MD5 lines to digests
22 all currently released portage versions will blow up (as they will treat
23 the provided hash as a MD5 value, call it a bug if you want).
24
25 Instead I suggest we completely reorganize the digest system from
26 scratch by unifying the digests and the manifest files. As you all know
27 our tree is getting bigger and bigger with no end in sight. That
28 combined with the usual filesystem overhead causes a lot of wasted space
29 on many systems. By unifying the digests with the Manifests we could
30 kill >15.000 very small files at once (in the long run, this would
31 require compatible portage versions for all users).
32
33 As for the new syntax, it should allow us to add new digest algorithms
34 to portage without changing the syntax. My current idea would be that
35 for each file in the tree and in SRC_URI we have a line specifying:
36 - the filename
37 - the filesize
38 - n digests (consisting of algorithmname and the checksum)
39 To maintain compability and support future enhancements each of these
40 lines has to be prefixed with a (set of) keyword(s) (FILE or DIGEST or
41 SRC_URI,EBULD,AUXFILE).
42 Example lines could be:
43
44 SRC_URI portage-2.0.51_rc7.tar.bz2 274572 MD5 1234 SHA1 abcd RMD160 9876
45 EBUILD portage-2.0.51_rc7.ebuild 11806 MD5 xyz SHA1 fifteen
46
47 (using fake checksums for readability).
48
49 Maybe the system can also be extended to incorporate GLEP 25 without
50 adding a ton of new files, I'd need some input from Brian on that issue.
51
52 The biggest problem for this proposal is of course compability, a rough
53 transition plan could be:
54 - keep digests as they are now
55 - add the new format to Manifests (additional to the current MD5 lines)
56 - support the new format in 2.0.52 (use it optionally for verification)
57 - use it for verification in 2.1 by default (and drop support for the
58 old system)
59 - exclude the old digests from `emerge --sync` in 2.1
60
61 And finally a summarizing list of reasons for the format:
62 - keep all checksums of a package in one place
63 - removes one level of indirection for signing
64 - digest generation currently recreates the Manifest anyway
65 - removing files from the tree
66 - allows for easy addition of new digest algorithms
67 - any syntax modification to the current digest files brings compability
68 problems with all currently existing portage versions while Manifest
69 changes do not
70 - potential to discover file collisions easier (currently you can have
71 the same file in two digests with different checksums, not a real
72 problem yet though)
73 - removes redundancy for common files
74
75 Let the discussions begin.
76
77 Marius

Replies