1 |
On Sun, Feb 13, 2022 at 11:03:36AM +0100, Michał Górny wrote: |
2 |
> > I'll try and paraphrase what I feel the problem statement you're making |
3 |
> > is: |
4 |
> > - TL;DR: Golang-package ebuilds account for a disproportionate amount of |
5 |
> > Manifest space in the tree. |
6 |
> > - Golang-package ebuilds account for 55% of all DIST entries in the tree. |
7 |
> Let's not forget that each ebuild is almost as big as the relevant |
8 |
> Manifest entries... |
9 |
> |
10 |
> > - Even if de-duped, it still be would be 24% of all DIST entries in the |
11 |
> > tree. |
12 |
> ...so even if you dedupe Manifests, ebuilds will still account for major |
13 |
> space waste. |
14 |
And this is why I asked that we start by forming a clear problem |
15 |
statement before. |
16 |
|
17 |
|
18 |
> > "just repackage it" |
19 |
> > - Is that in line with the licenses of ALL of distfiles? |
20 |
> I don't know how that "goproxy" thing works exactly but I suppose if |
21 |
> they can provide downloads for these packages, then we should be able |
22 |
> as well. |
23 |
goproxy contains 99.9% of packages from what I've seen. There are tiny |
24 |
number that aren't on there. |
25 |
|
26 |
> > - Is it easy enough for any overlay author to use? |
27 |
> > - How are you going to trust every overlay's repacking of the distfiles: |
28 |
> > that they didn't include malicious code in the distfile, that wouldn't |
29 |
> > be caught in a review of the ebuild? |
30 |
> How are you going to trust anything coming from overlays? It's not like |
31 |
> people have to resort to super-complex methods of adding something |
32 |
> malicious. |
33 |
Yes, there's already many ways to get stuff in, but repackaging makes |
34 |
the review much harder. |
35 |
|
36 |
> > So save your 18MiB of disk space of Manifest lines, or 691MiB. |
37 |
> Our duty is primarily towards Gentoo users, not mirror admins. Mirror |
38 |
> admins have signed up for hosting potentially large distfiles. |
39 |
> |
40 |
> Then, there's a major difference between temporarily spend 1G |
41 |
> on a relatively small number of mirrors, and wasting 18 MiB on *all* |
42 |
> user systems. Not to mention git history that's going to be growing |
43 |
> forever, permanently wasting space on old versions of Go software. |
44 |
So you want any Gentoo user for these packages to have to download the |
45 |
368MB each time, even on slow internet. |
46 |
|
47 |
> > I feel the "re-packaging" is worse than the situation that they're in. |
48 |
> |
49 |
> No, it's not. The situation we're in is causing *permanent* damage, |
50 |
> and the longer we stall, the worse it gets. |
51 |
Only permanent to those who use a single git checkout with full history. |
52 |
There is no permanent impact to users who still consume the tree via |
53 |
rsync, snapshots, or other point-in-time views, including shallow Git |
54 |
history. |
55 |
|
56 |
I have a separate plan to mitigate long-term damage to Git history, |
57 |
because it's something that WAS discussed back when the move to Git was |
58 |
started. |
59 |
|
60 |
> > So how do we find some reasonable solution between this? |
61 |
> > - Not waste DIST lines space on every copy of the repo |
62 |
> > - Not cause waste space on Gentoo mirrors |
63 |
> > - Not cause waste download bandwidth when very little data has changed |
64 |
> - Not waste space with humongous ebuilds |
65 |
> - Not consume insane number of inodes on tiny files |
66 |
So let's go BACK to the proposal of multiple fetch phases. |
67 |
|
68 |
If we have that as an OPTION, does it solve these 5 concerns, and/or can |
69 |
it be made close enough to do that? |
70 |
|
71 |
1. A small number of distfiles, as distributed by the primary upstream, |
72 |
no dependency distfiles included. |
73 |
2. EGO_SUM removed from ebuilds |
74 |
3. Manifest contains a small number DIST entries per PV. |
75 |
4. Downloading is still optimized because only new upstream dependencies |
76 |
are fetched. |
77 |
5. Mirrors still optimized because they don't inflate the number of |
78 |
files. |
79 |
6. Zero or small repackaging burden on developers or overlay authors. |
80 |
|
81 |
How can we get this AND ensure the security you're worried about |
82 |
previously? |
83 |
|
84 |
|
85 |
What if there was something to convert go.sum into Manifest-like files |
86 |
during packaging, and then put THOSE files as DIST. |
87 |
|
88 |
Manifest-like: |
89 |
- include gosum checksum |
90 |
- include BLAKE checksum |
91 |
- include SRC_URI |
92 |
|
93 |
You only have to download these Manifest-like files if you want to build the |
94 |
package. |
95 |
|
96 |
The Manifest-like files grow linearly with the number of distfiles used, |
97 |
but the Gentoo tree only has to reference the Manifest. |
98 |
|
99 |
The packager is responsible for creating the file, specifically the |
100 |
BLAKE checksums, but their work can be cross-checked by anybody else, |
101 |
using the gosum checksum (so it's harder for the packager to introduce |
102 |
malicious behavior; not impossible, just harder, much like today). |
103 |
|
104 |
This solution would ALSO work for Rust & Texlive, since they provide |
105 |
some level of integrity & authenticity of those distfiles (as I pointed |
106 |
out, the node system doesn't seem to provide authenticity). |
107 |
|
108 |
This probably has further applications where ebuilds themselves can |
109 |
start to do additional verification of distfiles, like the sec-keys/ |
110 |
category, which also had the potential to greatly expand the tree size. |
111 |
|
112 |
-- |
113 |
Robin Hugh Johnson |
114 |
Gentoo Linux: Dev, Infra Lead, Foundation Treasurer |
115 |
E-Mail : robbat2@g.o |
116 |
GnuPG FP : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85 |
117 |
GnuPG FP : 7D0B3CEB E9B85B1F 825BCECF EE05E6F6 A48F6136 |