Gentoo Archives: gentoo-dev

From: Zac Medico <zmedico@g.o>
To: gentoo-dev@l.g.o, Rich Freeman <rich0@g.o>
Cc: binhost@g.o
Subject: Re: [gentoo-dev] New project: binhost
Date: Sun, 14 Feb 2021 01:51:42
Message-Id: 818c2e65-2501-9429-a9e7-95868c2d1c96@gentoo.org
In Reply to: Re: [gentoo-dev] New project: binhost by Rich Freeman
1 On 2/10/21 11:11 AM, Rich Freeman wrote:
2 > On Wed, Feb 10, 2021 at 12:57 PM Andreas K. Hüttel <dilfridge@g.o> wrote:
3 >>
4 >> * what portage features are still needed or need improvements (e.g. binpkg
5 >> signing and verification)
6 >> * how should hosting look like
7 >
8 > Some ideas for portage enhancements:
9 >
10 > 1. Ability to fetch binary packages from some kind of repo.
11
12 The old PORTAGE_BINHOST functionality has been replaced with a
13 binrepos.conf file that's very similar to repos.conf:
14
15 https://bugs.gentoo.org/668334
16
17 It doesn't have explicit support for multiple local binary package
18 repositories yet, but somebody got it working with src-uri set to a
19 file:/ uri as described in comments on this bug:
20
21 https://bugs.gentoo.org/768957
22
23 > 2. Ability to have multiple binary packages co-exist in a repo (local
24 > or remote) with different build attributes (arch, USE, CFLAGS,
25 > DEPENDS, whatever).
26
27 We can now enable FEATURES=binpkg-multi-instance by default now that
28 this bug is fixed:
29
30 https://bugs.gentoo.org/571126
31
32 > 3. Ability to pick the most appropriate binary packages to use based
33 > on user preferences (with a mix of hard and soft preferences).
34
35 Current package selection logic for binary packages is basically the
36 same as for ebuilds. These are the notable differences:
37
38 1) Binary packages are sorted in descending order by (version, mtime),
39 so then most recent builds are preferred when the versions are identical.
40
41 2) The --binpkg-respect-use option rejects binary packages what would
42 need to be rebuilt in order to match local USE settings.
43
44 > One idea I've had around how #2-3 might be implemented is:
45 > 1. Binary packages already contain data on how they were built (USE
46 > flags, dependencies, etc). Place this in a file using a deterministic
47 > sorting/etc order so that two builds with the same settings will have
48 > the same results.
49
50 This would only be needed to multi-profile binhosts that provide a
51 variety of configurations for the same package.
52
53 Features like this are not necessary if the binhost only intends to
54 provide packages for a single profile.
55
56 > 2. Generate a hash of the file contents - this can go in the filename
57 > so that the file can co-exist with other files, and be located
58 > assuming you have a full matching set of metadata.
59
60 For FEATURES=binpkg-multi-instance we currently use an integer BUILD_ID
61 ensure that file names are unique.
62
63 > 3. Start dropping attributes from the file based on a list of
64 > priorities and generate additional hashes. Create symlinked files to
65 > the original file using these hashes (overwriting or not existing
66 > symlinks based on policy). This allows the binary package to be found
67 > using either an exact set of attributes or a subset of higher-priority
68 > attributes. This is analogous to shared object symlinking.
69 > 4. The package manager will look for a binary package first using the
70 > user's full config, and then by dropping optional elements of the
71 > config (so maybe it does the search without CFLAGs, then without USE
72 > flags). Eventually it aborts based on user prefs (maybe the user only
73 > wants an exact match, or is willing to accept alternate CFLAGs but not
74 > USE flags, or maybe anything for the arch is selected> 5. As always the final selected binary package still gets evaluated
75 > like any other binary package to ensure it is usable.
76 >
77 > Such a system can identify whether a potentially usable file exists
78 > using only filename, cutting down on fetching. In the interests of
79 > avoiding useless fetches we would only carry step 3 reasonably far -
80 > packages would have to match based on architecture and any dynamic
81 > linking requirements. So we wouldn't generate hashes that didn't
82 > include at least those minimums, and the package manager wouldn't
83 > search for them.
84 >
85 > Obviously you could do more (if you have 5 combinations of use flags,
86 > look for the set that matches most closely). That couldn't be done
87 > using hashes alone in an efficient way. You could have a small
88 > manifest file alongside the binary package that could be fetched
89 > separately if the package manager wants to narrow things down and
90 > fetch a few of those to narrow it down further.
91
92 All of the above is oriented toward multi-profile binhosts, so we'll
93 have to do a cost/benefit analysis to determine whether it's worth the
94 effort to introduce the complexity that multi-profile binhosts add.
95
96 > Or you could skip the hash searching and just fetch all the manifests
97 > for a particular package/arch and just search all of those, but that
98 > is more data to transfer just to do a query. A metadata cache of some
99 > kind of might be another solution. Content hashes would probably
100 > still be useful just to allow co-existence of alternate builds.
101
102 This also relates to the centralized Packages file that's currently used
103 to distribute the package metadata for all packages in a binhost. We can
104 make it scale better if we split out a separate index per package, not
105 unlike a pypi simple index:
106
107 https://pypi.org/simple/
108 --
109 Thanks,
110 Zac

Attachments

File name MIME type
signature.asc application/pgp-signature

Replies

Subject Author
Re: [gentoo-dev] New project: binhost Rich Freeman <rich0@g.o>
Re: [gentoo-dev] New project: binhost Zac Medico <zmedico@g.o>