Gentoo Archives: gentoo-user

From: Caveman Al Toraboran <toraboracaveman@××××××××××.com>
To: "gentoo-user@l.g.o" <gentoo-user@l.g.o>
Subject: Re: [gentoo-user] almost free launch: an idea to lower build time, and rice, at the same time
Date: Wed, 06 Nov 2019 01:36:59
Message-Id: G5C58TkdeT2wjAEnsd9_9KTIrm848Sc8a57obHS_dMwnV33mdueYevkbdJNLXSg2kbPrfbDyotNHkFsdRIm2a5k4QyrAZ-n8gs6OuCNkAPk=@protonmail.com
In Reply to: Re: [gentoo-user] almost free launch: an idea to lower build time, and rice, at the same time by "Mickaël Bucas"
1 On Tuesday, November 5, 2019 7:05 PM, Mickaël Bucas <mbucas@×××××.com> wrote:
2 > Hi Caveman
3 >
4 > The Portage tree contains a few binary packages prepared by Gentoo
5 > developers, like Firefox, Rust, LibreOffice...
6 > "ls -d /usr/portage//-bin" shows about 90 packages prepared in this
7 > way, some of them because they are non-free like Oracle JDK
8 >
9 > This means that there is no necessary changes to Gentoo to accomplish
10 > what you describe : compile the packages, write the ebuilds for the
11 > binary packages, publish ebuilds in an overlay.
12
13 Some qt-related packages are really slow to compile, yet still not listed.
14 A problem with this approach is that IMO it's too manual and doesn't react
15 dynamically to user changes.
16
17 IMO we can consider this an automated community-driven bin-host that uses
18 statistics in order to tell which packages are reliable. In case of hardware
19 mismatches, I think we can find a binary that's compiled with the desired,
20 say, USE flags, but compiled on an older CPU model that's backward compatible
21 with the newer rare one that one might be using.
22
23 > But the really short list above shows that it's a really complex task
24 > because of all dependencies and configurable elements in Gentoo. If
25 > you just have a look at the output of "emerge --info" you can imagine
26 > all the moving parts, like compiler versions and compile options,
27 > Bash, Perl, Python, Init system, USE flags (combinatorial), even human
28 > languages. And that is just the easily visible parts !
29
30 True, however a few points:
31
32 * If we look at that info, from the perspective of individual packages, it is
33 has much less degrees of variations in practice. E.g. if we look at the USE
34 flags dimension, dev-qt/qtwebengine has 12 of them, so worst case for this
35 aspect we get about:
36
37 nchoosek(12,1) + nchoosek(12,2) + ... + nchoosek(12,12) = 4095
38
39 possible combinations with those 12 flags. But,
40 most people are only interested in 2 sets of potential USE flag
41 configurations, one with ALSA, or another with PA. So in practice, that 4095
42 is probably reduced to just 2 or 3 clusters of configurations (not 4095).
43
44 * For hardware details, such as the exact CPU model and the kinds of features
45 actually enabled by the compiler when using `-march=native`. I don't know
46 the actual distribution of this in practice, but is it not possible
47 that users can be given the choice to simply pick a binary that's compiled on
48 an older backwards compatible CPU?
49
50 E.g. the system could prompt the user the nearest (e.g. in selection of USE
51 flags) to his query, by presenting the user with a binary compiled with an
52 older x86-64 CPU model than his newer x86-64 CPU.
53
54 This way, this could become simply an automated bin-host that blurs as
55 necessary, and forks variations of specific configurations as demand raises,
56 all without needing manual dev time to package *-bin manually.
57
58
59 > I remember reading an article about a man trying to reproduce binary
60 > packages of a binary distribution and failing to do so, because there
61 > are so many parts involved. I've read later that distributions have
62 > done some work to have reproducible builds, but I'm not sure how
63 > successful they are, even when all choices are predefined.
64 >
65 > Given that Gentoo has taken a whole different road by having more
66 > choices available to the user, I don't think the compilation results
67 > of one configuration would be easily used on another.
68
69 Is it possible to collect statistics of such configurations from Gentoo users?
70
71 I don't know what would the outcome be, but I think it's worth exploring. E.g.
72 what if it turned out that there is not much diversity in our
73 settings? E.g. we can find a few really popular clusters of USE, langauge,
74 license, flags? As for hardware, what would be the latest backwards compatible
75 CPU that has compiled a binary for me with enough statistical confidence in its
76 reliability?
77
78
79 > To go even further, pushing your compiled packages to a public server
80 > may create a security risk by exposing many parts of your
81 > configuration that could be analyzed by malicious people.
82
83 Any example of such sensitive information that might be in the binaries? Just
84 curious, as I don't know much about this.
85
86 I could be wrong, but so far my thought is that I don't think we get much bits
87 of entropy for our security by hiding our package lists, because I think an
88 adversary can probably already use statistics to predict common clusters of
89 package lists that we might use.s.
90
91 So I personally doubt that attackers would face much difficulty by not knowing
92 our packages, because our packages are probably already predictable since our
93 distribution of packages is not that diverse.
94
95
96 > So far I don't see a really big advantage in building this kind of
97 > infrastructure compared to either a binary distribution or Gentoo with
98 > home compilation.
99
100 IMO the real value is that it will be some kind of an automated community-driven
101 bin-host that uses statistics to quantify the reliability of its bins, and to
102 automatically create bins of special cases as the demand raises (e.g. common USE
103 flag combinations that become trendy), without needing to wait for a package
104 maintainer to bundle a *-bin package.
105
106 I think, if this works, it may make Gentoo even better at binary packages than
107 the mainly binary distros.
108
109
110 rgrds,
111 cm