Gentoo Archives: gentoo-project

From: "Robin H. Johnson" <robbat2@g.o>
To: gentoo-project@l.g.o
Subject: Re: [gentoo-project] Call for agenda items - Council meeting on 2022-02-13
Date: Sun, 13 Feb 2022 07:37:36
Message-Id: robbat2-20220213T051449-589247959Z@orbis-terrarum.net
In Reply to: Re: [gentoo-project] Call for agenda items - Council meeting on 2022-02-13 by "Michał Górny"
1 On Sat, Feb 12, 2022 at 10:58:27PM +0100, Michał Górny wrote:
2 > The "better solution" has already been provided -- repackage it.
3 I had asked people to clarify the problems, and you still didn't
4 actually provide a clear problem statement.
5
6 I'll try and paraphrase what I feel the problem statement you're making
7 is:
8 - TL;DR: Golang-package ebuilds account for a disproportionate amount of
9 Manifest space in the tree.
10 - Golang-package ebuilds account for 55% of all DIST entries in the tree.
11 - Even if de-duped, it still be would be 24% of all DIST entries in the
12 tree.
13 - Rust is 8% baseline, 6% if de-duped.
14
15 > All of your solutions solve only part of the problem, and usually
16 > introduce more problems. The current distfile fetching solution works
17 > well for over 99% of the Gentoo packages today (this is not "the 99%"
18 > but actual numbers, more below). The problem is with Go, and Go needs
19 > to be fixed (or worked around), and don't extend it to "theoretically
20 > Rust or NodeJS" because they're nowhere close to how badly designed Go
21 > ecosystem is.
22
23 "just repackage it"
24 - Is that in line with the licenses of ALL of distfiles?
25 - If the combined gentoo-specific distfile is 100MB per $PV, you're now
26 forcing 100MB of new downloads for each version.
27 - Is it easy enough for any overlay author to use?
28 - How are you going to trust every overlay's repacking of the distfiles:
29 that they didn't include malicious code in the distfile, that wouldn't
30 be caught in a review of the ebuild?
31
32 I haven't done a full analysis of how much space it would end up take,
33
34 But as a quick version, let's consider Vault 1.8.x series.
35
36 vault-1.8.6 -> vault-1.8.7 didn't change distfiles.
37 vault-1.8.7 -> vault-1.8.8
38
39 1.8.6 -> 368MB worth of distfiles
40 of which 22-24MB are different from 1.8.7 & 1.8.8
41
42 So naively, if you re-packaged those 3 versions, 368MiBx3 => 1104MiB on
43 the mirrors.
44
45 Or just 413MiB of unique distfiles between the versions.
46
47 So save your 18MiB of disk space of Manifest lines, or 691MiB.
48
49 > Now, for some numbers. According to my greps (please correct me if I've
50 > gotten this wrong), there 121 packages using go .mod files in ::gentoo.
51 > This is 0.6% of all packages. These 121 packages make up 44.7% of total
52 > Manifest lines, or 46.6% total Manifest bytes (probably thanks to long
53 > filenames). This is 18 MiB of DIST lines that unconditionally waste
54 > user's disk space and cause git history to grow larger with every
55 > version bump. And the worst part is, if I understand the process
56 > correctly, every version bump will bring more and more .mod files
57 > for old versions of packages that aren't even really used to build
58 > the package!
59 The takeaway here is that we need a much better way to handle DIST lines
60 that are duplicated between packages. If we could achieve perfect
61 de-duplication for DIST lines, the 66591 entries for Golang stuff,
62 taking 22795192 bytes, becomes 13887 entries, with only 4798540 bytes.
63
64 The .mod entries are 8402 of those, for 2894495 bytes; my original
65 proposal of the Golang stuff did propose a way to remove all .mod
66 entries: at the cost of always requiring the .zip file downloaded
67 instead, when the .mod file alone would suffice for some packages.
68
69 Non-golang DIST entries account for 54042 entries, 16800665 bytes; after
70 de-dupe that is 43792 entries 13608721.
71
72 Golang upstream packaging absolutely needs to clean up the proliferation
73 of go.sum bringing in every .mod file from the past, and making it
74 easier for every module author to keep it clean.
75
76 > Again, let me repeat. We are talking about inventing super complex
77 > solutions to make 121 bad packages suddenly look good. Why isn't
78 > anybody discussing fixing these packages instead? Why did people think
79 > it's a good idea to commit them as-is?
80 I feel the "re-packaging" is worse than the situation that they're in.
81
82 So how do we find some reasonable solution between this?
83 - Not waste DIST lines space on every copy of the repo
84 - Not cause waste space on Gentoo mirrors
85 - Not cause waste download bandwidth when very little data has changed
86
87 --
88 Robin Hugh Johnson
89 Gentoo Linux: Dev, Infra Lead, Foundation Treasurer
90 E-Mail : robbat2@g.o
91 GnuPG FP : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85
92 GnuPG FP : 7D0B3CEB E9B85B1F 825BCECF EE05E6F6 A48F6136

Attachments

File name MIME type
signature.asc application/pgp-signature

Replies