1 |
* "Robin H. Johnson" <robbat2-20220608T184338-394361540Z @orbis-terrarum.net> : |
2 |
Wrote on Wed, 8 Jun 2022 20:42:48 +0000: |
3 |
> EGO_SUM vs dependency tarballs: |
4 |
> - bloats ebuilds |
5 |
> - bloats Manifests |
6 |
> - bloats metadata/md5-cache/ (SRC_URI etc) |
7 |
> - doesn't bloat mirrors with gentoo-unique distfiles |
8 |
> - EGO_SUM is verifiable/reproducible from Upstream Go systems |
9 |
> - less downloads on upgrades (only changed Go deps, not entire dep tarballs) |
10 |
> |
11 |
> EGO_SUM data right now adds, to every user's system: |
12 |
> - 2.6MB of text to ebuilds (340k after de-dupe) |
13 |
> - 7MB of text to Manifests (2M after de-dupe) |
14 |
> - 6.4MB+ of text to metadata/md5-cache (I don't have a easy way to |
15 |
> calc deduped amount here) |
16 |
> On the server side: |
17 |
> - The sum total of Go distfiles mirrored on Gentoo mirrors right now |
18 |
> is only 3.4GB. |
19 |
> - less downloads |
20 |
> |
21 |
> Dependency tarballs: |
22 |
> - Right now ~15GiB on each mirror, plus storage of the primary copy |
23 |
> somewhere (dev.g.o right now, but not great) |
24 |
> - Conservatively if the remaining EGO_SUM packages converted to Dep |
25 |
> tarballs, it would need another 8GB each of primary location and |
26 |
> mirrors. |
27 |
> - larger downloads for users who DO want to upgrade a Go package (all |
28 |
> new deps tarball even if only one or two deps changed) |
29 |
> - must be preserved much longer, unless we can introduce a guaranteed |
30 |
> way to regenerate them for any prior ebuild. |
31 |
> |
32 |
> I was trying to introduce a third option, but I haven't had the time to |
33 |
> write an entire GLEP. |
34 |
> |
35 |
> The TL;DR is introducing a 2nd-level Manifest+metadata file, that tries |
36 |
> to move just the metadata out of the tree, in a way that can be |
37 |
> regenerated (specifically, a 1:1 reproducible creation from a given go.sum). |
38 |
> It DOES need to contain slightly more data than the present Manifest, |
39 |
> specifically a full SRC_URI entry for each file (upstream URI plus what |
40 |
> to rename it to on Gentoo side) |
41 |
> |
42 |
> The 2nd-level Manifest would be listed as SRC_URI, and be handled in |
43 |
> src_fetch/src_unpack. Download & verify the extra distfiles, against the |
44 |
> Manifest checksum data (and for Golang against go.sum checksums). |
45 |
> |
46 |
> The Portage mirrordist code needs the most work in this case, as it |
47 |
> would need to fetch the 2nd-level Manifests so it can populate Gentoo |
48 |
> mastermirror with the distfiles mirrored from upstream. |
49 |
> |
50 |
> The storage costs for the proposed idea: |
51 |
> - same 1:1 base distfile storage as EGO_SUM (e.g. upstream distfiles are |
52 |
> mirrored 1:1 content, just different naming) |
53 |
> - Probably 1 Metadata-Manifest file per ebuild $PVR (conceptually it |
54 |
> could be split more or shared between some ebuilds/packages) |
55 |
> - Main tree Manifests: 1 DIST entry per Metadata-Manifest in a given package |
56 |
> - Main tree ebuilds: 1 line for the Metadata-Manifest in the ebuild. |
57 |
> - metadata/md5-cache: 1 src_uri line! |
58 |
> - mirrors: add the Metadata-Manifest |
59 |
|
60 |
[Without claiming to have fully understood the proposal above: around |
61 |
Apr 15th 22 I tried suggesting to WilliamH on IRC that perhaps portage |
62 |
should implement the dirhash approach that go has taken to solve the |
63 |
problem of upstream sources when they invented go.sum. |
64 |
|
65 |
from hash.go in sources |
66 |
go/src/cmd/vendor/golang.org/x/mod/sumdb/dirhash/hash.go |
67 |
|
68 |
// Hash1 is "h1:" followed by the base64-encoded SHA-256 hash of a |
69 |
summary prepared as if by the Unix command:find . -type f | sort | |
70 |
sha256sum |
71 |
|
72 |
loosely speaking the "manifest" could publish this dirhash of contents |
73 |
of go-mod/cache (which would have been bundled in the -deps.tar.xz) |
74 |
|
75 |
The immediate motivation was to avoid the network when I already had the |
76 |
sources locally: instead of downloading a -deps.tar.xz I could create it |
77 |
locally and dump it in distdir. portage would check the (hypothetically) |
78 |
published dirhash and let it through. the local timestamps and uid in my |
79 |
tarball and the upstream tarball wouldn't upset it. |
80 |
|
81 |
One unchecked assumption is that go-mod/cache can be recreated by |
82 |
unpacking sources. If so then with a notion of a "second level manifest" |
83 |
(the equivalent of go.sum) the contents can be assembled without having |
84 |
to store or download the actual -deps tarball. |
85 |
|
86 |
I didn't get very far in convincing WilliamH of my need so I dropped |
87 |
the idea. (I'm not sure if I'm being any clearer, if I'm missing |
88 |
something, do let me know) |