1 |
On Mon, Apr 11, 2022 at 7:14 PM Joshua Kinard <kumba@g.o> wrote: |
2 |
> |
3 |
> On 4/5/2022 17:49, Jason A. Donenfeld wrote: |
4 |
> > Hi Matt, |
5 |
> > |
6 |
> > On Tue, Apr 5, 2022 at 10:38 PM Matt Turner <mattst88@g.o> wrote: |
7 |
> >> |
8 |
> >> On Tue, Apr 5, 2022 at 12:30 PM Jason A. Donenfeld <zx2c4@g.o> wrote: |
9 |
> >>> By the way, we're not currently _checking_ two hash functions during |
10 |
> >>> src_prepare(), are we? |
11 |
> >> |
12 |
> >> I don't know, but the hash-checking is definitely checked before src_prepare(). |
13 |
> > |
14 |
> > Er, during the builtin fetch phase. Anyway, you know what I meant. :) |
15 |
> > |
16 |
> > Anyway, looking at the portage source code, to answer my own question, |
17 |
> > it looks like the file is actually being read twice and both hashes |
18 |
> > computed. I would have at least expected an optimization like: |
19 |
> > |
20 |
> > hash1_init(&hash1); |
21 |
> > hash2_init(&hash2); |
22 |
> > for chunks in file: |
23 |
> > hash1_update(&hash1, chunk); |
24 |
> > hash2_update(&hash2, chunk); |
25 |
> > hash1_final(&hash1, out1); |
26 |
> > hash2_final(&hash2, out2); |
27 |
> > |
28 |
> > But actually what's happening is the even less efficient: |
29 |
> > |
30 |
> > hash1_init(&hash1); |
31 |
> > for chunks in file: |
32 |
> > hash1_update(&hash1, chunk); |
33 |
> > hash1_final(&hash1, out1); |
34 |
> > hash2_init(&hash2); |
35 |
> > for chunks in file: |
36 |
> > hash2_update(&hash2, chunk); |
37 |
> > hash1_final(&hash2, out2); |
38 |
> > |
39 |
> > So the file winds up being open and read twice. For huge tarballs like |
40 |
> > chromium or libreoffice... |
41 |
> > |
42 |
> > But either way you do it - the missed optimization above or the |
43 |
> > unoptimized reality below - there's still twice as much work being |
44 |
> > done. This is all unless I've misread the source code, which is |
45 |
> > possible, so if somebody knows this code well and I'm wrong here, |
46 |
> > please do speak up. |
47 |
> |
48 |
> Not to go off-topic, but where in Portage's source is this logic at? It |
49 |
> seems like an easy fix for a slightly more efficient Portage. |
50 |
|
51 |
I believe it's the portage.checksum.verify_all() function. |
52 |
|
53 |
https://gitweb.gentoo.org/proj/portage.git/tree/lib/portage/checksum.py?h=portage-3.0.30#n471 |