1 |
Hey Robin, |
2 |
|
3 |
Sorry for the delay in getting back to you. As mentioned on IRC, both of |
4 |
your messages bounced earlier, and I was at a conference all last week. |
5 |
Catching up with this thread now... |
6 |
|
7 |
On Wed, Apr 06, 2022 at 05:23:25PM +0000, Robin H. Johnson wrote: |
8 |
> On Wed, Apr 06, 2022 at 02:15:02AM +0200, Jason A. Donenfeld wrote: |
9 |
> > 2) Comparability: other distros use SHA2-512, as well as various |
10 |
> > upstreams, which means we can compare our hashes to theirs easily. |
11 |
> Can we expand on this specific thread for a moment? |
12 |
> |
13 |
> I was the author of GLEP59 about changing the Manifest hashes, and I |
14 |
> noted at the time, with references, that the effective strength of a set |
15 |
> of hashes is only that of the strongest hash. |
16 |
> |
17 |
> One of my regrets from GLEP59 is that it's made it harder for use cases |
18 |
> outside of the normal user distfile workflow. |
19 |
> |
20 |
> The use case that impacted me the most was being able to compare our |
21 |
> distfiles were over time vs external sources, esp. if the file goes |
22 |
> missing or was fetch-restricted and we can't produce a new hash of it. |
23 |
> Maybe upstream only ever published SHA1/SHA256, and we only ever |
24 |
> calculated SHA512/BLAKE2b on the file. Since we never had hashes from |
25 |
> both sides at the same time, we cannot prove it was the same file. |
26 |
> |
27 |
> We need to be able to ship one or more hashes to users, for the specific |
28 |
> use case of validating the distfiles they download. |
29 |
> |
30 |
> As a developer, I'd like to be able to track the other hashes for a |
31 |
> file, without forcing ourselves to retain the file. This might be to |
32 |
> compare with upstream published hashes, or to compare with other |
33 |
> distros. |
34 |
> |
35 |
> In fact it would be really nice to have a semi-automated pipeline to |
36 |
> plug in signed upstream hashes to our Manifests, and make it possibly to |
37 |
> prove our new SHA512/BLAKE2B hash was taken over the correct input in |
38 |
> the first place, and there wasn't any subtle supply-chain attack early |
39 |
> in the packaging process. |
40 |
> |
41 |
> Where would those hashes go? They don't need to be in the Manifest, or |
42 |
> at the very least they don't need to be distributed via rsync to users |
43 |
> (it only costs a small amount of bytes to do so). |
44 |
> |
45 |
> Where else could they go? |
46 |
> - Commit messages could work. |
47 |
> - Git notes to a lesser degree. |
48 |
> - alternate repos? |
49 |
|
50 |
Interesting idea. This seems orthogonal to my proposal ("just use one |
51 |
hash in the manifest and call it a day; make it the same as what gpg |
52 |
uses for signing to minimize moving pieces"), and so I'm hesitant to |
53 |
indulge too much in this thread, for fear of it being derailed with this |
54 |
different thing you want. |
55 |
|
56 |
With that said, I'm not quite sure I understood everything you're asking |
57 |
for. You said that you want "to have a semi-automated pipeline to plug |
58 |
in signed upstream hashes to our Manifests, and make it possibly to |
59 |
prove our new SHA512/BLAKE2B hash was taken over the correct input", but |
60 |
at the same time you also said that you want "to be able to track the |
61 |
other hashes for a file, without forcing ourselves to retain the file." |
62 |
What I'm wondering is: how do you propose that we calculate a SHA-512 |
63 |
hash of a file and "prove it correct" using, e.g., a signed SHA-256 |
64 |
hash, if we don't download the whole file? |
65 |
|
66 |
It sounds like the thing that would be interesting to you would be for |
67 |
infra to manage some sort of master hash database collecting all the |
68 |
hashes from all over the internet of every file that hits distfiles, |
69 |
verifying and then generating a bunch more hash variants of all kinds, |
70 |
and then cross-verifying those with the hashes extracted from every |
71 |
other distro, making for a wild hash verification aggregator machine. I |
72 |
think I can see the utility of it. It would also unburden manifest |
73 |
files, as those could then just have a SHA-512 hash and nothing else, |
74 |
making things a bit lighter. |
75 |
|
76 |
|
77 |
> > A reason why some people might prefer BLAKE2b over SHA2-512 is a |
78 |
> > performance improvement. However, seeing as right now we're opening |
79 |
> > the file, reading it, computing BLAKE2b, closing the file, opening the |
80 |
> > file again, reading it again, computing SHA2-512, closing the file, I |
81 |
> > don't think performance is actually something people care about. Seen |
82 |
> > differently, removing either one of them will already give us a |
83 |
> > performance "boost" or sorts. |
84 |
> Or just only verifying the "strongest" hash gives you that boost. |
85 |
> |
86 |
> I do want to check into the code that you pointed out, because I'm |
87 |
> really sure much older versions of Portage did the CORRECT thing of only |
88 |
> reading the file in a single pass. |
89 |
|
90 |
Let me know if your findings are different from mine... |
91 |
|
92 |
Jason |