1 |
Le mar. 5 nov. 2019 à 01:02, Caveman Al Toraboran |
2 |
<toraboracaveman@××××××××××.com> a écrit : |
3 |
> |
4 |
> |
5 |
> DISCLAIMER: I am not claiming that this idea is new. It is probably not new. |
6 |
> ----------- Even though some of its details might be new for a Linux |
7 |
> distribution, it's all based on boring well-established bits of |
8 |
> known science. But regardless of its newness, I think it's worth |
9 |
> sharing with the hope that it may re-kindle the fire in a nerd's |
10 |
> heart (or a group of nerds) so that they develop this for me (or |
11 |
> us). |
12 |
> |
13 |
> |
14 |
> |
15 |
> GOAL: |
16 |
> ----- |
17 |
> Reduce compile time, rice (e.g. fancy USE, make.conf, etc), and yet not |
18 |
> increase dev overhead. |
19 |
> |
20 |
> |
21 |
> CURRENT SITUATION: |
22 |
> ------------------ |
23 |
> If you use *-bin packages, you cannot rice, and must compile on your own. |
24 |
> |
25 |
> |
26 |
> THE APPROACH: |
27 |
> ------------- |
28 |
> 1. Some nerd (or a group of nerds) makes (or make) a package, maybe call it |
29 |
> `almostfreelunch.ebuild`. |
30 |
> |
31 |
> 2. Say you want to compile qtwebengine. You do: `almostfreelunch -aqvDuNt |
32 |
> --backbrack=1000 qtwebengine`. |
33 |
> |
34 |
> 3. The app, `almostfreelunch`, will lookup your build setup (e.g. USE flags, |
35 |
> make.conf settings, etc) for all packages that you are about to build on |
36 |
> your system as you are about to install that qtwebengine. |
37 |
> |
38 |
> 4. The app will upload that info to a central server, which looks up the |
39 |
> popularity of certain configurations. E.g. see the distribution of |
40 |
> compile-time configurations for a given package. The central server will |
41 |
> then figure out things like, qtwebengine is commonly compiled for x86-64 |
42 |
> with certain USE flags and other settings in make.conf. |
43 |
> |
44 |
> 5. If the server figures out that the package that `almostfreelunch` is about |
45 |
> to compile is popular enough with the specific build settings that is about |
46 |
> to happen, the server will reply to the app and tell it "hi, upload to me |
47 |
> your bins when cooked, plz". But if the build setting is not popular |
48 |
> enough, it will reply "nothx". This way, the central server will not end up |
49 |
> with too much undesired binaries with uncommon build-time settings. |
50 |
> |
51 |
> 6. The central server will also collect multiple binary packages from multiple |
52 |
> people who use `almostfreelunch` for the same packages and the same |
53 |
> build-time options. I.e. multiple qtwebengine with identical build-time |
54 |
> settings (e.g. same USE flags, make.conf, etc). |
55 |
> |
56 |
> 7. The central server will perform statistical analysis against all of the |
57 |
> uploaded binaries, of the same packages and the same claimed build-time |
58 |
> settings, to cross-check those binaries to obtain a statistical confidence |
59 |
> in identifying which of the binaries is the good one, and which ones are |
60 |
> outliers outlier. Outliers might exist because of users with buggy |
61 |
> compilers, or malicious users that intentionally try to inject malware/bugs |
62 |
> into their binaries. |
63 |
> |
64 |
> 8. Thanks to information theory, we will be able to figure out how much |
65 |
> redundancy is needed in order to numerically calculate confidence value that |
66 |
> shows how trusty a given binary is. E.g. if a package, with specific |
67 |
> build-time options, as a very large number of binary submissions that are |
68 |
> also extremely similar (i.e. only differ in trivial aspects due to certain |
69 |
> randomness in how compilers work), then the central server can calculate a |
70 |
> high confidence value for it. Else, the confidence value drops. |
71 |
> |
72 |
> 9. If a user invokes `almostfreelunch -aqvDuNt --backbrack=1000 qtwebengine` |
73 |
> and the central server tells the user that there is an already compiled |
74 |
> package with the same settings, then the server simply tells the user, and |
75 |
> shows him the confidence associated with the fitness of the binary (based on |
76 |
> calculations in stepss (6) to (8)). By default, bins with too-low |
77 |
> confidence values will be masked and proper colours will be used to |
78 |
> adequately scare the users from low-confidence packages. |
79 |
> |
80 |
> 10. If at step (9) the user likes the confidence of the pre-compiled binary |
81 |
> package, the user can simply download the binary package, blazing fast, with |
82 |
> all the nice UES and make.conf flags that he has. Else, the user is free to |
83 |
> compile his own version, and upload his own binary, to help the server |
84 |
> enhance its confidence as calculated in steps (6) to (8). |
85 |
> |
86 |
> |
87 |
> NOTES: |
88 |
> ------ |
89 |
> * The statistical analysis in step (5) can also consider the compile time of |
90 |
> packages. So the minimum popularity required for a specific package build is |
91 |
> weighted while considering the total build time. This way, too slow-to-build |
92 |
> packages will end up getting a lower minimum popularity than those small |
93 |
> packages. Choosing the sweet-spot trade-off is a matter of optimizing |
94 |
> resources of the central server. |
95 |
> |
96 |
> * The statistical analysis in steps (6) to (8) could also be further enhanced |
97 |
> by ranking individual users who upload the binaries. Users, who upload bins, |
98 |
> could optionally also sign their packages, and henceforth be identified by |
99 |
> the central server. Eventually, statistics can be used to also calculate a |
100 |
> confidence measure on how trusty a user is. This can eventually help the |
101 |
> server more accurately calculate the confidence of the uploaded bins, by also |
102 |
> incorporating the past history of those users. |
103 |
> |
104 |
> Sub-note 1: The reason signing is optional, is because ---thanks to |
105 |
> information theory--- we don't really need signed packages in order to know |
106 |
> that a package is not an outlier. I.e. even unsigned packages can help us |
107 |
> figure out the probability of error by simply looking at the redundancy |
108 |
> counts. |
109 |
> |
110 |
> Sub-note 2: But, of course, signing would help as it will allow the central |
111 |
> server's statistical analysis to also put into account which bin is coming |
112 |
> from which user. E.g. not all users are equally trusty, and this can help |
113 |
> the system be more accurate in its prediction of the error on the package. |
114 |
> |
115 |
> Sub-note 3: I said it already, but just to repeat, when the error becomes |
116 |
> low enough, this distributed system can potentially end up producing binaries |
117 |
> that match or exceed trusty Gentoo devs. Adding common heuristic checks are |
118 |
> optional, but can make the bins even more likely to beat manual devs. |
119 |
> |
120 |
> * Eventually, this statistical approach could also replace the need for |
121 |
> manually electing binary package maintainers by a principled statistical |
122 |
> approach. Thanks to the way stuff work in nature, this system has the |
123 |
> potential of being even more trusty than the trustier bin-packager developer. |
124 |
> |
125 |
> * In the future, this could be extended to source-code ebuilds, too. |
126 |
> Ultimately, reaching a quality equal to, or exceeding that of, the current |
127 |
> manual system. This may pave the path to a much more efficient operating |
128 |
> system where less manual labour is needed by the devs, so that more devs can |
129 |
> do actually more fun things than packaging boring stuff. |
130 |
> |
131 |
> * This system will get better the more people use it, and the better it gets |
132 |
> the more the people would like it and hence even more will use it! It works |
133 |
> like turbo-charging. Hence, if this succeeds, we may market Gentoo as the |
134 |
> first "turbo-charged OS"! |
135 |
> |
136 |
> * Based on step (5), the server can set frequency thresholds in order to keep |
137 |
> its resources only utilized by highly demanded packages. |
138 |
> |
139 |
> |
140 |
> rgrds, |
141 |
> cm |
142 |
|
143 |
Hi Caveman |
144 |
|
145 |
The Portage tree contains a few binary packages prepared by Gentoo |
146 |
developers, like Firefox, Rust, LibreOffice... |
147 |
"ls -d /usr/portage/*/*-bin" shows about 90 packages prepared in this |
148 |
way, some of them because they are non-free like Oracle JDK |
149 |
|
150 |
This means that there is no necessary changes to Gentoo to accomplish |
151 |
what you describe : compile the packages, write the ebuilds for the |
152 |
binary packages, publish ebuilds in an overlay. |
153 |
|
154 |
But the really short list above shows that it's a really complex task |
155 |
because of all dependencies and configurable elements in Gentoo. If |
156 |
you just have a look at the output of "emerge --info" you can imagine |
157 |
all the moving parts, like compiler versions and compile options, |
158 |
Bash, Perl, Python, Init system, USE flags (combinatorial), even human |
159 |
languages. And that is just the easily visible parts ! |
160 |
|
161 |
I remember reading an article about a man trying to reproduce binary |
162 |
packages of a binary distribution and failing to do so, because there |
163 |
are so many parts involved. I've read later that distributions have |
164 |
done some work to have reproducible builds, but I'm not sure how |
165 |
successful they are, even when all choices are predefined. |
166 |
|
167 |
Given that Gentoo has taken a whole different road by having more |
168 |
choices available to the user, I don't think the compilation results |
169 |
of one configuration would be easily used on another. |
170 |
|
171 |
To go even further, pushing your compiled packages to a public server |
172 |
may create a security risk by exposing many parts of your |
173 |
configuration that could be analyzed by malicious people. |
174 |
|
175 |
So far I don't see a really big advantage in building this kind of |
176 |
infrastructure compared to either a binary distribution or Gentoo with |
177 |
home compilation. |
178 |
|
179 |
Best regards |
180 |
|
181 |
Mickaël Bucas |