Gentoo Archives: gentoo-dev

From: "Jason A. Donenfeld" <zx2c4@g.o>
To: Matt Turner <mattst88@g.o>
Cc: gentoo development <gentoo-dev@l.g.o>
Subject: Re: [gentoo-dev] proposal: use only one hash function in manifest files
Date: Tue, 05 Apr 2022 21:49:30
Message-Id: CAHmME9rD-WsQwL0W705+_32UWC=fgRezd-6skWYB9yRuHfRNDw@mail.gmail.com
In Reply to: Re: [gentoo-dev] proposal: use only one hash function in manifest files by Matt Turner
1 Hi Matt,
2
3 On Tue, Apr 5, 2022 at 10:38 PM Matt Turner <mattst88@g.o> wrote:
4 >
5 > On Tue, Apr 5, 2022 at 12:30 PM Jason A. Donenfeld <zx2c4@g.o> wrote:
6 > > By the way, we're not currently _checking_ two hash functions during
7 > > src_prepare(), are we?
8 >
9 > I don't know, but the hash-checking is definitely checked before src_prepare().
10
11 Er, during the builtin fetch phase. Anyway, you know what I meant. :)
12
13 Anyway, looking at the portage source code, to answer my own question,
14 it looks like the file is actually being read twice and both hashes
15 computed. I would have at least expected an optimization like:
16
17 hash1_init(&hash1);
18 hash2_init(&hash2);
19 for chunks in file:
20 hash1_update(&hash1, chunk);
21 hash2_update(&hash2, chunk);
22 hash1_final(&hash1, out1);
23 hash2_final(&hash2, out2);
24
25 But actually what's happening is the even less efficient:
26
27 hash1_init(&hash1);
28 for chunks in file:
29 hash1_update(&hash1, chunk);
30 hash1_final(&hash1, out1);
31 hash2_init(&hash2);
32 for chunks in file:
33 hash2_update(&hash2, chunk);
34 hash1_final(&hash2, out2);
35
36 So the file winds up being open and read twice. For huge tarballs like
37 chromium or libreoffice...
38
39 But either way you do it - the missed optimization above or the
40 unoptimized reality below - there's still twice as much work being
41 done. This is all unless I've misread the source code, which is
42 possible, so if somebody knows this code well and I'm wrong here,
43 please do speak up.
44
45 Jason

Replies