1 |
-----BEGIN PGP SIGNED MESSAGE----- |
2 |
Hash: SHA1 |
3 |
|
4 |
Tiziano Müller wrote: |
5 |
> Am Sonntag, den 08.02.2009, 00:59 -0800 schrieb Zac Medico: |
6 |
>> -----BEGIN PGP SIGNED MESSAGE----- |
7 |
>> Hash: SHA1 |
8 |
>> |
9 |
>> Tiziano Müller wrote: |
10 |
>>> Am Samstag, den 07.02.2009, 15:23 -0800 schrieb Zac Medico: |
11 |
>>>> -----BEGIN PGP SIGNED MESSAGE----- |
12 |
>>>> Hash: SHA1 |
13 |
>>>> |
14 |
>>>> Tiziano Müller wrote: |
15 |
>>>>> Am Montag, den 02.02.2009, 12:34 -0800 schrieb Zac Medico: |
16 |
>>>> I like that idea. That way it's not necessary to bump the EAPI in |
17 |
>>>> order to change the hash function. So, a typical DIGESTS value might |
18 |
>>>> look like this: |
19 |
> You still have to bump the EAPI in case you want to use a new hash not |
20 |
> already available now (like SHA-3). The advantage of noting the used |
21 |
> hash is that new PMs can handle old metadata cache. |
22 |
|
23 |
That's true. |
24 |
|
25 |
>>>> SHA1 02021be38b a28b191904 3992945426 6ec21b29a3 |
26 |
>>> Sleeping over it again I don't think that truncating a hash is a good |
27 |
>>> idea (truncating it from 40 to 10 digits makes the possibility of |
28 |
>>> collisions much much higher). |
29 |
>> The probability of collision is much higher, but it's still |
30 |
>> relatively small. Given the "avalanche effect" that is typical of |
31 |
>> cryptographic hash functions, it's extremely unlikely that collision |
32 |
>> will occur in such a way that it will cause a problem for cache |
33 |
>> validation. |
34 |
> The "avalanche effect" as I understood it is required for a hash |
35 |
> function to avoid simple calculations of collisions (what the diffusion |
36 |
> is for crypto algorithms). So, small changes should affect as many |
37 |
> numbers in the hash as possible. But you don't have only small changes |
38 |
> here in case somebody patches an eclass, so, the only thing which counts |
39 |
> is the probability of a collision. |
40 |
|
41 |
Well, the avalanche effect helps in the sense that the leftmost 10 |
42 |
digits would serve approximately as well as any other 10 digits out |
43 |
of all of them. But you're right about the probability of a |
44 |
collision being what really matters. With 10 hex digits, we've got a |
45 |
space of 16^10 = 1.1e12 possible combinations. Given a space that |
46 |
large, the probability of a collision pretty small. |
47 |
|
48 |
>>> But if you want to go this way, I'd say you should use something like |
49 |
>>> SHA1t (t for truncated) to make sure we can use full hashes once we feel |
50 |
>>> it's appropriate. |
51 |
>> We could, but I think SHA1 would also be fine since one can infer |
52 |
>> from the length of the string that it's been truncated. |
53 |
> No, guessing is a bad thing here because it could be truncated because |
54 |
> of faulty metadata. But the main motivation is that if you write SHA1 |
55 |
> everyone reading it expects it to be a full SHA1 hash, which it isn't. |
56 |
|
57 |
Well, if the metadata is faulty then the digests are unlikely to |
58 |
match and the cache will be discarded anyway as invalid. However, I |
59 |
think your point is still somewhat valid, so SHA1t is fine with me |
60 |
if that makes more people happy. Does anyone else have a preference |
61 |
here? |
62 |
|
63 |
> But if your target is to reduce the size of the metadata cache, why |
64 |
> store the hashes of the eclasses in the ebuild's metadata and not in a |
65 |
> seperate dir? They have to be the same for every ebuild, don't they? |
66 |
> In case you have an average number of eclasses which is bigger than 4, |
67 |
> you can even store the full hash with less space used than with |
68 |
> truncated hashes for all eclasses. |
69 |
|
70 |
The problem with having eclass integrity data shared in a separate |
71 |
file is that it creates a requirement for all cache entries which |
72 |
reference the same eclasses to be consistent with one another. This |
73 |
means that a single cache entry can no longer be updated atomically. |
74 |
For example, before updating the shared eclass integrity data, you'd |
75 |
want to make sure that you first discard all of the cache entries |
76 |
which reference it. Although it can be done this way, I think it's |
77 |
much more convenient to have all of the integrity data encapsulated |
78 |
within each individual cache entry. |
79 |
- -- |
80 |
Thanks, |
81 |
Zac |
82 |
-----BEGIN PGP SIGNATURE----- |
83 |
Version: GnuPG v2.0.9 (GNU/Linux) |
84 |
|
85 |
iEYEARECAAYFAkmPQjkACgkQ/ejvha5XGaNFUACfQvVYgNiZNK8PVReTZKN47wQU |
86 |
9wkAniltb1ivZYGgmhn/eli2fpprkOlI |
87 |
=2mbq |
88 |
-----END PGP SIGNATURE----- |