Gentoo Archives: gentoo-dev

From: "Tiziano Müller" <dev-zero@g.o>
To: gentoo-dev@l.g.o
Subject: Re: [gentoo-dev] [RFC] DIGESTS metadata variable for cache validation
Date: Sun, 08 Feb 2009 11:51:31
Message-Id: 1234093879.24784.2819.camel@localhost
In Reply to: Re: [gentoo-dev] [RFC] DIGESTS metadata variable for cache validation by Zac Medico
1 Am Sonntag, den 08.02.2009, 00:59 -0800 schrieb Zac Medico:
2 > -----BEGIN PGP SIGNED MESSAGE-----
3 > Hash: SHA1
4 >
5 > Tiziano Müller wrote:
6 > > Am Samstag, den 07.02.2009, 15:23 -0800 schrieb Zac Medico:
7 > >> -----BEGIN PGP SIGNED MESSAGE-----
8 > >> Hash: SHA1
9 > >>
10 > >> Tiziano Müller wrote:
11 > >>> Am Montag, den 02.02.2009, 12:34 -0800 schrieb Zac Medico:
12 > >>>> For the digest format, I suggest that we use the leftmost 10
13 > >>>> hexadecimal digits of the SHA-1 digest. The rationale for limiting
14 > >>>> it to 10 digits (out of 40) is to save space. Due to the avalanche
15 > >>>> effect [2], 10 digits should be sufficient to ensure that problems
16 > >>>> resulting from hash collisions are extremely unlikely.
17 > >>> I'd recommend to prefix the digest with a "{TYPE}" (like for hashed
18 > >>> passwords) to be able to change the digest algorithm as needed
19 > >>> (especially in regards to the current SHA successor competition).
20 > >>> This allows a future package manager which might use SHA-3 for hashing
21 > >>> (once it's released) to still check old digests. Furthermore it would
22 > >>> allow for easier transition and only needs a definition of allowed
23 > >>> hashes instead of a specific one.
24 > >> I like that idea. That way it's not necessary to bump the EAPI in
25 > >> order to change the hash function. So, a typical DIGESTS value might
26 > >> look like this:
27 You still have to bump the EAPI in case you want to use a new hash not
28 already available now (like SHA-3). The advantage of noting the used
29 hash is that new PMs can handle old metadata cache.
30
31 > >>
32 > >> SHA1 02021be38b a28b191904 3992945426 6ec21b29a3
33 > >
34 > > Sleeping over it again I don't think that truncating a hash is a good
35 > > idea (truncating it from 40 to 10 digits makes the possibility of
36 > > collisions much much higher).
37 >
38 > The probability of collision is much higher, but it's still
39 > relatively small. Given the "avalanche effect" that is typical of
40 > cryptographic hash functions, it's extremely unlikely that collision
41 > will occur in such a way that it will cause a problem for cache
42 > validation.
43 The "avalanche effect" as I understood it is required for a hash
44 function to avoid simple calculations of collisions (what the diffusion
45 is for crypto algorithms). So, small changes should affect as many
46 numbers in the hash as possible. But you don't have only small changes
47 here in case somebody patches an eclass, so, the only thing which counts
48 is the probability of a collision.
49
50 >
51 > > But if you want to go this way, I'd say you should use something like
52 > > SHA1t (t for truncated) to make sure we can use full hashes once we feel
53 > > it's appropriate.
54 >
55 > We could, but I think SHA1 would also be fine since one can infer
56 > from the length of the string that it's been truncated.
57 No, guessing is a bad thing here because it could be truncated because
58 of faulty metadata. But the main motivation is that if you write SHA1
59 everyone reading it expects it to be a full SHA1 hash, which it isn't.
60
61 But if your target is to reduce the size of the metadata cache, why
62 store the hashes of the eclasses in the ebuild's metadata and not in a
63 seperate dir? They have to be the same for every ebuild, don't they?
64 In case you have an average number of eclasses which is bigger than 4,
65 you can even store the full hash with less space used than with
66 truncated hashes for all eclasses.
67
68 --
69 -------------------------------------------------------
70 Tiziano Müller
71 Gentoo Linux Developer, Council Member
72 Areas of responsibility:
73 Samba, PostgreSQL, CPP, Python, sysadmin
74 E-Mail : dev-zero@g.o
75 GnuPG FP : F327 283A E769 2E36 18D5 4DE2 1B05 6A63 AE9C 1E30

Attachments

File name MIME type
signature.asc application/pgp-signature

Replies