Gentoo Archives: gentoo-project

From: "Robin H. Johnson" <robbat2@g.o>
To: gentoo-project@l.g.o
Subject: Re: [gentoo-project] Call for agenda items - Council meeting on 2022-02-13
Date: Sun, 13 Feb 2022 18:43:07
Message-Id: robbat2-20220213T181930-179354189Z@orbis-terrarum.net
In Reply to: Re: [gentoo-project] Call for agenda items - Council meeting on 2022-02-13 by "Michał Górny"
1 On Sun, Feb 13, 2022 at 11:03:36AM +0100, Michał Górny wrote:
2 > > I'll try and paraphrase what I feel the problem statement you're making
3 > > is:
4 > > - TL;DR: Golang-package ebuilds account for a disproportionate amount of
5 > > Manifest space in the tree.
6 > > - Golang-package ebuilds account for 55% of all DIST entries in the tree.
7 > Let's not forget that each ebuild is almost as big as the relevant
8 > Manifest entries...
9 >
10 > > - Even if de-duped, it still be would be 24% of all DIST entries in the
11 > > tree.
12 > ...so even if you dedupe Manifests, ebuilds will still account for major
13 > space waste.
14 And this is why I asked that we start by forming a clear problem
15 statement before.
16
17
18 > > "just repackage it"
19 > > - Is that in line with the licenses of ALL of distfiles?
20 > I don't know how that "goproxy" thing works exactly but I suppose if
21 > they can provide downloads for these packages, then we should be able
22 > as well.
23 goproxy contains 99.9% of packages from what I've seen. There are tiny
24 number that aren't on there.
25
26 > > - Is it easy enough for any overlay author to use?
27 > > - How are you going to trust every overlay's repacking of the distfiles:
28 > > that they didn't include malicious code in the distfile, that wouldn't
29 > > be caught in a review of the ebuild?
30 > How are you going to trust anything coming from overlays? It's not like
31 > people have to resort to super-complex methods of adding something
32 > malicious.
33 Yes, there's already many ways to get stuff in, but repackaging makes
34 the review much harder.
35
36 > > So save your 18MiB of disk space of Manifest lines, or 691MiB.
37 > Our duty is primarily towards Gentoo users, not mirror admins. Mirror
38 > admins have signed up for hosting potentially large distfiles.
39 >
40 > Then, there's a major difference between temporarily spend 1G
41 > on a relatively small number of mirrors, and wasting 18 MiB on *all*
42 > user systems. Not to mention git history that's going to be growing
43 > forever, permanently wasting space on old versions of Go software.
44 So you want any Gentoo user for these packages to have to download the
45 368MB each time, even on slow internet.
46
47 > > I feel the "re-packaging" is worse than the situation that they're in.
48 >
49 > No, it's not. The situation we're in is causing *permanent* damage,
50 > and the longer we stall, the worse it gets.
51 Only permanent to those who use a single git checkout with full history.
52 There is no permanent impact to users who still consume the tree via
53 rsync, snapshots, or other point-in-time views, including shallow Git
54 history.
55
56 I have a separate plan to mitigate long-term damage to Git history,
57 because it's something that WAS discussed back when the move to Git was
58 started.
59
60 > > So how do we find some reasonable solution between this?
61 > > - Not waste DIST lines space on every copy of the repo
62 > > - Not cause waste space on Gentoo mirrors
63 > > - Not cause waste download bandwidth when very little data has changed
64 > - Not waste space with humongous ebuilds
65 > - Not consume insane number of inodes on tiny files
66 So let's go BACK to the proposal of multiple fetch phases.
67
68 If we have that as an OPTION, does it solve these 5 concerns, and/or can
69 it be made close enough to do that?
70
71 1. A small number of distfiles, as distributed by the primary upstream,
72 no dependency distfiles included.
73 2. EGO_SUM removed from ebuilds
74 3. Manifest contains a small number DIST entries per PV.
75 4. Downloading is still optimized because only new upstream dependencies
76 are fetched.
77 5. Mirrors still optimized because they don't inflate the number of
78 files.
79 6. Zero or small repackaging burden on developers or overlay authors.
80
81 How can we get this AND ensure the security you're worried about
82 previously?
83
84
85 What if there was something to convert go.sum into Manifest-like files
86 during packaging, and then put THOSE files as DIST.
87
88 Manifest-like:
89 - include gosum checksum
90 - include BLAKE checksum
91 - include SRC_URI
92
93 You only have to download these Manifest-like files if you want to build the
94 package.
95
96 The Manifest-like files grow linearly with the number of distfiles used,
97 but the Gentoo tree only has to reference the Manifest.
98
99 The packager is responsible for creating the file, specifically the
100 BLAKE checksums, but their work can be cross-checked by anybody else,
101 using the gosum checksum (so it's harder for the packager to introduce
102 malicious behavior; not impossible, just harder, much like today).
103
104 This solution would ALSO work for Rust & Texlive, since they provide
105 some level of integrity & authenticity of those distfiles (as I pointed
106 out, the node system doesn't seem to provide authenticity).
107
108 This probably has further applications where ebuilds themselves can
109 start to do additional verification of distfiles, like the sec-keys/
110 category, which also had the potential to greatly expand the tree size.
111
112 --
113 Robin Hugh Johnson
114 Gentoo Linux: Dev, Infra Lead, Foundation Treasurer
115 E-Mail : robbat2@g.o
116 GnuPG FP : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85
117 GnuPG FP : 7D0B3CEB E9B85B1F 825BCECF EE05E6F6 A48F6136

Attachments

File name MIME type
signature.asc application/pgp-signature

Replies