1 |
On Sat, Feb 12, 2022 at 10:58:27PM +0100, Michał Górny wrote: |
2 |
> The "better solution" has already been provided -- repackage it. |
3 |
I had asked people to clarify the problems, and you still didn't |
4 |
actually provide a clear problem statement. |
5 |
|
6 |
I'll try and paraphrase what I feel the problem statement you're making |
7 |
is: |
8 |
- TL;DR: Golang-package ebuilds account for a disproportionate amount of |
9 |
Manifest space in the tree. |
10 |
- Golang-package ebuilds account for 55% of all DIST entries in the tree. |
11 |
- Even if de-duped, it still be would be 24% of all DIST entries in the |
12 |
tree. |
13 |
- Rust is 8% baseline, 6% if de-duped. |
14 |
|
15 |
> All of your solutions solve only part of the problem, and usually |
16 |
> introduce more problems. The current distfile fetching solution works |
17 |
> well for over 99% of the Gentoo packages today (this is not "the 99%" |
18 |
> but actual numbers, more below). The problem is with Go, and Go needs |
19 |
> to be fixed (or worked around), and don't extend it to "theoretically |
20 |
> Rust or NodeJS" because they're nowhere close to how badly designed Go |
21 |
> ecosystem is. |
22 |
|
23 |
"just repackage it" |
24 |
- Is that in line with the licenses of ALL of distfiles? |
25 |
- If the combined gentoo-specific distfile is 100MB per $PV, you're now |
26 |
forcing 100MB of new downloads for each version. |
27 |
- Is it easy enough for any overlay author to use? |
28 |
- How are you going to trust every overlay's repacking of the distfiles: |
29 |
that they didn't include malicious code in the distfile, that wouldn't |
30 |
be caught in a review of the ebuild? |
31 |
|
32 |
I haven't done a full analysis of how much space it would end up take, |
33 |
|
34 |
But as a quick version, let's consider Vault 1.8.x series. |
35 |
|
36 |
vault-1.8.6 -> vault-1.8.7 didn't change distfiles. |
37 |
vault-1.8.7 -> vault-1.8.8 |
38 |
|
39 |
1.8.6 -> 368MB worth of distfiles |
40 |
of which 22-24MB are different from 1.8.7 & 1.8.8 |
41 |
|
42 |
So naively, if you re-packaged those 3 versions, 368MiBx3 => 1104MiB on |
43 |
the mirrors. |
44 |
|
45 |
Or just 413MiB of unique distfiles between the versions. |
46 |
|
47 |
So save your 18MiB of disk space of Manifest lines, or 691MiB. |
48 |
|
49 |
> Now, for some numbers. According to my greps (please correct me if I've |
50 |
> gotten this wrong), there 121 packages using go .mod files in ::gentoo. |
51 |
> This is 0.6% of all packages. These 121 packages make up 44.7% of total |
52 |
> Manifest lines, or 46.6% total Manifest bytes (probably thanks to long |
53 |
> filenames). This is 18 MiB of DIST lines that unconditionally waste |
54 |
> user's disk space and cause git history to grow larger with every |
55 |
> version bump. And the worst part is, if I understand the process |
56 |
> correctly, every version bump will bring more and more .mod files |
57 |
> for old versions of packages that aren't even really used to build |
58 |
> the package! |
59 |
The takeaway here is that we need a much better way to handle DIST lines |
60 |
that are duplicated between packages. If we could achieve perfect |
61 |
de-duplication for DIST lines, the 66591 entries for Golang stuff, |
62 |
taking 22795192 bytes, becomes 13887 entries, with only 4798540 bytes. |
63 |
|
64 |
The .mod entries are 8402 of those, for 2894495 bytes; my original |
65 |
proposal of the Golang stuff did propose a way to remove all .mod |
66 |
entries: at the cost of always requiring the .zip file downloaded |
67 |
instead, when the .mod file alone would suffice for some packages. |
68 |
|
69 |
Non-golang DIST entries account for 54042 entries, 16800665 bytes; after |
70 |
de-dupe that is 43792 entries 13608721. |
71 |
|
72 |
Golang upstream packaging absolutely needs to clean up the proliferation |
73 |
of go.sum bringing in every .mod file from the past, and making it |
74 |
easier for every module author to keep it clean. |
75 |
|
76 |
> Again, let me repeat. We are talking about inventing super complex |
77 |
> solutions to make 121 bad packages suddenly look good. Why isn't |
78 |
> anybody discussing fixing these packages instead? Why did people think |
79 |
> it's a good idea to commit them as-is? |
80 |
I feel the "re-packaging" is worse than the situation that they're in. |
81 |
|
82 |
So how do we find some reasonable solution between this? |
83 |
- Not waste DIST lines space on every copy of the repo |
84 |
- Not cause waste space on Gentoo mirrors |
85 |
- Not cause waste download bandwidth when very little data has changed |
86 |
|
87 |
-- |
88 |
Robin Hugh Johnson |
89 |
Gentoo Linux: Dev, Infra Lead, Foundation Treasurer |
90 |
E-Mail : robbat2@g.o |
91 |
GnuPG FP : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85 |
92 |
GnuPG FP : 7D0B3CEB E9B85B1F 825BCECF EE05E6F6 A48F6136 |