Gentoo Archives: gentoo-dev

From: "Jason A. Donenfeld" <zx2c4@g.o>
To: gentoo-dev@l.g.o
Subject: Re: [gentoo-dev] proposal: use only one hash function in manifest files
Date: Wed, 20 Apr 2022 13:55:30
Message-Id: YmAQvLnne92XX5YF@zx2c4.com
In Reply to: Re: [gentoo-dev] proposal: use only one hash function in manifest files by "Robin H. Johnson"
1 Hey Robin,
2
3 Sorry for the delay in getting back to you. As mentioned on IRC, both of
4 your messages bounced earlier, and I was at a conference all last week.
5 Catching up with this thread now...
6
7 On Wed, Apr 06, 2022 at 05:23:25PM +0000, Robin H. Johnson wrote:
8 > On Wed, Apr 06, 2022 at 02:15:02AM +0200, Jason A. Donenfeld wrote:
9 > > 2) Comparability: other distros use SHA2-512, as well as various
10 > > upstreams, which means we can compare our hashes to theirs easily.
11 > Can we expand on this specific thread for a moment?
12 >
13 > I was the author of GLEP59 about changing the Manifest hashes, and I
14 > noted at the time, with references, that the effective strength of a set
15 > of hashes is only that of the strongest hash.
16 >
17 > One of my regrets from GLEP59 is that it's made it harder for use cases
18 > outside of the normal user distfile workflow.
19 >
20 > The use case that impacted me the most was being able to compare our
21 > distfiles were over time vs external sources, esp. if the file goes
22 > missing or was fetch-restricted and we can't produce a new hash of it.
23 > Maybe upstream only ever published SHA1/SHA256, and we only ever
24 > calculated SHA512/BLAKE2b on the file. Since we never had hashes from
25 > both sides at the same time, we cannot prove it was the same file.
26 >
27 > We need to be able to ship one or more hashes to users, for the specific
28 > use case of validating the distfiles they download.
29 >
30 > As a developer, I'd like to be able to track the other hashes for a
31 > file, without forcing ourselves to retain the file. This might be to
32 > compare with upstream published hashes, or to compare with other
33 > distros.
34 >
35 > In fact it would be really nice to have a semi-automated pipeline to
36 > plug in signed upstream hashes to our Manifests, and make it possibly to
37 > prove our new SHA512/BLAKE2B hash was taken over the correct input in
38 > the first place, and there wasn't any subtle supply-chain attack early
39 > in the packaging process.
40 >
41 > Where would those hashes go? They don't need to be in the Manifest, or
42 > at the very least they don't need to be distributed via rsync to users
43 > (it only costs a small amount of bytes to do so).
44 >
45 > Where else could they go?
46 > - Commit messages could work.
47 > - Git notes to a lesser degree.
48 > - alternate repos?
49
50 Interesting idea. This seems orthogonal to my proposal ("just use one
51 hash in the manifest and call it a day; make it the same as what gpg
52 uses for signing to minimize moving pieces"), and so I'm hesitant to
53 indulge too much in this thread, for fear of it being derailed with this
54 different thing you want.
55
56 With that said, I'm not quite sure I understood everything you're asking
57 for. You said that you want "to have a semi-automated pipeline to plug
58 in signed upstream hashes to our Manifests, and make it possibly to
59 prove our new SHA512/BLAKE2B hash was taken over the correct input", but
60 at the same time you also said that you want "to be able to track the
61 other hashes for a file, without forcing ourselves to retain the file."
62 What I'm wondering is: how do you propose that we calculate a SHA-512
63 hash of a file and "prove it correct" using, e.g., a signed SHA-256
64 hash, if we don't download the whole file?
65
66 It sounds like the thing that would be interesting to you would be for
67 infra to manage some sort of master hash database collecting all the
68 hashes from all over the internet of every file that hits distfiles,
69 verifying and then generating a bunch more hash variants of all kinds,
70 and then cross-verifying those with the hashes extracted from every
71 other distro, making for a wild hash verification aggregator machine. I
72 think I can see the utility of it. It would also unburden manifest
73 files, as those could then just have a SHA-512 hash and nothing else,
74 making things a bit lighter.
75
76
77 > > A reason why some people might prefer BLAKE2b over SHA2-512 is a
78 > > performance improvement. However, seeing as right now we're opening
79 > > the file, reading it, computing BLAKE2b, closing the file, opening the
80 > > file again, reading it again, computing SHA2-512, closing the file, I
81 > > don't think performance is actually something people care about. Seen
82 > > differently, removing either one of them will already give us a
83 > > performance "boost" or sorts.
84 > Or just only verifying the "strongest" hash gives you that boost.
85 >
86 > I do want to check into the code that you pointed out, because I'm
87 > really sure much older versions of Portage did the CORRECT thing of only
88 > reading the file in a single pass.
89
90 Let me know if your findings are different from mine...
91
92 Jason