Gentoo Archives: gentoo-dev

From: Jaco Kroon <jaco@××××××.za>
To: gentoo-dev@l.g.o, Rich Freeman <rich0@g.o>
Cc: binhost@g.o
Subject: Re: [gentoo-dev] New project: binhost
Date: Thu, 11 Feb 2021 09:42:26
Message-Id: 1d923863-7a12-b614-2a02-56f1d9484d65@uls.co.za
In Reply to: Re: [gentoo-dev] New project: binhost by Rich Freeman
1 Hi,
2
3 +1 - love the idea, def joining.
4
5 However, I suspect the op was aiming at a publicly hosted binpkg
6 server.  Given the below it becomes one server/host, we care only about
7 the build parameters, and I suspect profile then has less of an effect
8 on the built packages.
9
10 For publicly *available* infrastructure I do however not recommend that
11 arbitrary people be able to upload.  We can however from attempted
12 download logs try to determine which combinations of USE flags are
13 required (with hashes of stuff this becomes tricky as even only 16 USE
14 flags are already 65536 potential hashes, and 20 clocks in at over 16
15 million, still perfectly reversable, but once we start getting to 30+
16 USE flags ...).  So perhaps a way to feedback "Hey, I looked for this
17 combo/hash" so we don't need to reverse hashes.
18
19 Would definitely join such a project, and it would be greatly beneficial
20 if we can improve the infra to have a form of distributed build ... ie,
21 for private infrastucture, improve the mechanisms used to "check for
22 binpkg availability, i not available, build it and submit back to binary
23 host", obviously all such nodes would need to be considered "trusted".
24
25 Kind Regards,
26 Jaco
27
28 On 2021/02/10 21:11, Rich Freeman wrote:
29 > On Wed, Feb 10, 2021 at 12:57 PM Andreas K. Hüttel <dilfridge@g.o> wrote:
30 >> * what portage features are still needed or need improvements (e.g. binpkg
31 >> signing and verification)
32 >> * how should hosting look like
33 > Some ideas for portage enhancements:
34 >
35 > 1. Ability to fetch binary packages from some kind of repo.
36 > 2. Ability to have multiple binary packages co-exist in a repo (local
37 > or remote) with different build attributes (arch, USE, CFLAGS,
38 > DEPENDS, whatever).
39 > 3. Ability to pick the most appropriate binary packages to use based
40 > on user preferences (with a mix of hard and soft preferences).
41 >
42 > One idea I've had around how #2-3 might be implemented is:
43 > 1. Binary packages already contain data on how they were built (USE
44 > flags, dependencies, etc). Place this in a file using a deterministic
45 > sorting/etc order so that two builds with the same settings will have
46 > the same results.
47 > 2. Generate a hash of the file contents - this can go in the filename
48 > so that the file can co-exist with other files, and be located
49 > assuming you have a full matching set of metadata.
50 > 3. Start dropping attributes from the file based on a list of
51 > priorities and generate additional hashes. Create symlinked files to
52 > the original file using these hashes (overwriting or not existing
53 > symlinks based on policy). This allows the binary package to be found
54 > using either an exact set of attributes or a subset of higher-priority
55 > attributes. This is analogous to shared object symlinking.
56 > 4. The package manager will look for a binary package first using the
57 > user's full config, and then by dropping optional elements of the
58 > config (so maybe it does the search without CFLAGs, then without USE
59 > flags). Eventually it aborts based on user prefs (maybe the user only
60 > wants an exact match, or is willing to accept alternate CFLAGs but not
61 > USE flags, or maybe anything for the arch is selected.
62 > 5. As always the final selected binary package still gets evaluated
63 > like any other binary package to ensure it is usable.
64 >
65 > Such a system can identify whether a potentially usable file exists
66 > using only filename, cutting down on fetching. In the interests of
67 > avoiding useless fetches we would only carry step 3 reasonably far -
68 > packages would have to match based on architecture and any dynamic
69 > linking requirements. So we wouldn't generate hashes that didn't
70 > include at least those minimums, and the package manager wouldn't
71 > search for them.
72 >
73 > Obviously you could do more (if you have 5 combinations of use flags,
74 > look for the set that matches most closely). That couldn't be done
75 > using hashes alone in an efficient way. You could have a small
76 > manifest file alongside the binary package that could be fetched
77 > separately if the package manager wants to narrow things down and
78 > fetch a few of those to narrow it down further.
79 >
80 > Or you could skip the hash searching and just fetch all the manifests
81 > for a particular package/arch and just search all of those, but that
82 > is more data to transfer just to do a query. A metadata cache of some
83 > kind of might be another solution. Content hashes would probably
84 > still be useful just to allow co-existence of alternate builds.
85 >