1 |
Am Sonntag, den 08.02.2009, 12:36 -0800 schrieb Zac Medico: |
2 |
> -----BEGIN PGP SIGNED MESSAGE----- |
3 |
> Hash: SHA1 |
4 |
> |
5 |
> Tiziano Müller wrote: |
6 |
> > Am Sonntag, den 08.02.2009, 00:59 -0800 schrieb Zac Medico: |
7 |
> >> -----BEGIN PGP SIGNED MESSAGE----- |
8 |
> >> Hash: SHA1 |
9 |
> >> |
10 |
> >> Tiziano Müller wrote: |
11 |
> >>> Am Samstag, den 07.02.2009, 15:23 -0800 schrieb Zac Medico: |
12 |
> >>>> -----BEGIN PGP SIGNED MESSAGE----- |
13 |
> >>>> Hash: SHA1 |
14 |
> >>>> |
15 |
> >>>> Tiziano Müller wrote: |
16 |
> >>>>> Am Montag, den 02.02.2009, 12:34 -0800 schrieb Zac Medico: |
17 |
> >>>> I like that idea. That way it's not necessary to bump the EAPI in |
18 |
> >>>> order to change the hash function. So, a typical DIGESTS value might |
19 |
> >>>> look like this: |
20 |
> > You still have to bump the EAPI in case you want to use a new hash not |
21 |
> > already available now (like SHA-3). The advantage of noting the used |
22 |
> > hash is that new PMs can handle old metadata cache. |
23 |
> |
24 |
> That's true. |
25 |
> |
26 |
> >>>> SHA1 02021be38b a28b191904 3992945426 6ec21b29a3 |
27 |
> >>> Sleeping over it again I don't think that truncating a hash is a good |
28 |
> >>> idea (truncating it from 40 to 10 digits makes the possibility of |
29 |
> >>> collisions much much higher). |
30 |
> >> The probability of collision is much higher, but it's still |
31 |
> >> relatively small. Given the "avalanche effect" that is typical of |
32 |
> >> cryptographic hash functions, it's extremely unlikely that collision |
33 |
> >> will occur in such a way that it will cause a problem for cache |
34 |
> >> validation. |
35 |
> > The "avalanche effect" as I understood it is required for a hash |
36 |
> > function to avoid simple calculations of collisions (what the diffusion |
37 |
> > is for crypto algorithms). So, small changes should affect as many |
38 |
> > numbers in the hash as possible. But you don't have only small changes |
39 |
> > here in case somebody patches an eclass, so, the only thing which counts |
40 |
> > is the probability of a collision. |
41 |
> |
42 |
> Well, the avalanche effect helps in the sense that the leftmost 10 |
43 |
> digits would serve approximately as well as any other 10 digits out |
44 |
> of all of them. But you're right about the probability of a |
45 |
> collision being what really matters. With 10 hex digits, we've got a |
46 |
> space of 16^10 = 1.1e12 possible combinations. Given a space that |
47 |
> large, the probability of a collision pretty small. |
48 |
> |
49 |
> >>> But if you want to go this way, I'd say you should use something like |
50 |
> >>> SHA1t (t for truncated) to make sure we can use full hashes once we feel |
51 |
> >>> it's appropriate. |
52 |
> >> We could, but I think SHA1 would also be fine since one can infer |
53 |
> >> from the length of the string that it's been truncated. |
54 |
> > No, guessing is a bad thing here because it could be truncated because |
55 |
> > of faulty metadata. But the main motivation is that if you write SHA1 |
56 |
> > everyone reading it expects it to be a full SHA1 hash, which it isn't. |
57 |
> |
58 |
> Well, if the metadata is faulty then the digests are unlikely to |
59 |
> match and the cache will be discarded anyway as invalid. However, I |
60 |
> think your point is still somewhat valid, so SHA1t is fine with me |
61 |
> if that makes more people happy. Does anyone else have a preference |
62 |
> here? |
63 |
> |
64 |
> > But if your target is to reduce the size of the metadata cache, why |
65 |
> > store the hashes of the eclasses in the ebuild's metadata and not in a |
66 |
> > seperate dir? They have to be the same for every ebuild, don't they? |
67 |
> > In case you have an average number of eclasses which is bigger than 4, |
68 |
> > you can even store the full hash with less space used than with |
69 |
> > truncated hashes for all eclasses. |
70 |
> |
71 |
> The problem with having eclass integrity data shared in a separate |
72 |
> file is that it creates a requirement for all cache entries which |
73 |
> reference the same eclasses to be consistent with one another. This |
74 |
> means that a single cache entry can no longer be updated atomically. |
75 |
> For example, before updating the shared eclass integrity data, you'd |
76 |
> want to make sure that you first discard all of the cache entries |
77 |
> which reference it. Although it can be done this way, I think it's |
78 |
> much more convenient to have all of the integrity data encapsulated |
79 |
> within each individual cache entry. |
80 |
Ok, let me see if I get this: Since parts of the content of a |
81 |
metadata-entry (like the DEPEND/RDEPEND vars) depend on the contents of |
82 |
the eclass used by the time a cache entry got generated, you want to |
83 |
store the eclass' hash in the ebuild entry to make sure the entry gets |
84 |
invalidated once the eclass changes. Is that correct? |
85 |
|
86 |
|
87 |
-- |
88 |
------------------------------------------------------- |
89 |
Tiziano Müller |
90 |
Gentoo Linux Developer, Council Member |
91 |
Areas of responsibility: |
92 |
Samba, PostgreSQL, CPP, Python, sysadmin |
93 |
E-Mail : dev-zero@g.o |
94 |
GnuPG FP : F327 283A E769 2E36 18D5 4DE2 1B05 6A63 AE9C 1E30 |