1 |
DISCLAIMER: I am not claiming that this idea is new. It is probably not new. |
2 |
----------- Even though some of its details might be new for a Linux |
3 |
distribution, it's all based on boring well-established bits of |
4 |
known science. But regardless of its newness, I think it's worth |
5 |
sharing with the hope that it may re-kindle the fire in a nerd's |
6 |
heart (or a group of nerds) so that they develop this for me (or |
7 |
us). |
8 |
|
9 |
|
10 |
|
11 |
GOAL: |
12 |
----- |
13 |
Reduce compile time, rice (e.g. fancy USE, make.conf, etc), and yet not |
14 |
increase dev overhead. |
15 |
|
16 |
|
17 |
CURRENT SITUATION: |
18 |
------------------ |
19 |
If you use *-bin packages, you cannot rice, and must compile on your own. |
20 |
|
21 |
|
22 |
THE APPROACH: |
23 |
------------- |
24 |
1. Some nerd (or a group of nerds) makes (or make) a package, maybe call it |
25 |
`almostfreelunch.ebuild`. |
26 |
|
27 |
2. Say you want to compile qtwebengine. You do: `almostfreelunch -aqvDuNt |
28 |
--backbrack=1000 qtwebengine`. |
29 |
|
30 |
3. The app, `almostfreelunch`, will lookup your build setup (e.g. USE flags, |
31 |
make.conf settings, etc) for all packages that you are about to build on |
32 |
your system as you are about to install that qtwebengine. |
33 |
|
34 |
4. The app will upload that info to a central server, which looks up the |
35 |
popularity of certain configurations. E.g. see the distribution of |
36 |
compile-time configurations for a given package. The central server will |
37 |
then figure out things like, qtwebengine is commonly compiled for x86-64 |
38 |
with certain USE flags and other settings in make.conf. |
39 |
|
40 |
5. If the server figures out that the package that `almostfreelunch` is about |
41 |
to compile is popular enough with the specific build settings that is about |
42 |
to happen, the server will reply to the app and tell it "hi, upload to me |
43 |
your bins when cooked, plz". But if the build setting is not popular |
44 |
enough, it will reply "nothx". This way, the central server will not end up |
45 |
with too much undesired binaries with uncommon build-time settings. |
46 |
|
47 |
6. The central server will also collect multiple binary packages from multiple |
48 |
people who use `almostfreelunch` for the same packages and the same |
49 |
build-time options. I.e. multiple qtwebengine with identical build-time |
50 |
settings (e.g. same USE flags, make.conf, etc). |
51 |
|
52 |
7. The central server will perform statistical analysis against all of the |
53 |
uploaded binaries, of the same packages and the same claimed build-time |
54 |
settings, to cross-check those binaries to obtain a statistical confidence |
55 |
in identifying which of the binaries is the good one, and which ones are |
56 |
outliers outlier. Outliers might exist because of users with buggy |
57 |
compilers, or malicious users that intentionally try to inject malware/bugs |
58 |
into their binaries. |
59 |
|
60 |
8. Thanks to information theory, we will be able to figure out how much |
61 |
redundancy is needed in order to numerically calculate confidence value that |
62 |
shows how trusty a given binary is. E.g. if a package, with specific |
63 |
build-time options, as a very large number of binary submissions that are |
64 |
also extremely similar (i.e. only differ in trivial aspects due to certain |
65 |
randomness in how compilers work), then the central server can calculate a |
66 |
high confidence value for it. Else, the confidence value drops. |
67 |
|
68 |
9. If a user invokes `almostfreelunch -aqvDuNt --backbrack=1000 qtwebengine` |
69 |
and the central server tells the user that there is an already compiled |
70 |
package with the same settings, then the server simply tells the user, and |
71 |
shows him the confidence associated with the fitness of the binary (based on |
72 |
calculations in stepss (6) to (8)). By default, bins with too-low |
73 |
confidence values will be masked and proper colours will be used to |
74 |
adequately scare the users from low-confidence packages. |
75 |
|
76 |
10. If at step (9) the user likes the confidence of the pre-compiled binary |
77 |
package, the user can simply download the binary package, blazing fast, with |
78 |
all the nice UES and make.conf flags that he has. Else, the user is free to |
79 |
compile his own version, and upload his own binary, to help the server |
80 |
enhance its confidence as calculated in steps (6) to (8). |
81 |
|
82 |
|
83 |
NOTES: |
84 |
------ |
85 |
* The statistical analysis in step (5) can also consider the compile time of |
86 |
packages. So the minimum popularity required for a specific package build is |
87 |
weighted while considering the total build time. This way, too slow-to-build |
88 |
packages will end up getting a lower minimum popularity than those small |
89 |
packages. Choosing the sweet-spot trade-off is a matter of optimizing |
90 |
resources of the central server. |
91 |
|
92 |
* The statistical analysis in steps (6) to (8) could also be further enhanced |
93 |
by ranking individual users who upload the binaries. Users, who upload bins, |
94 |
could optionally also sign their packages, and henceforth be identified by |
95 |
the central server. Eventually, statistics can be used to also calculate a |
96 |
confidence measure on how trusty a user is. This can eventually help the |
97 |
server more accurately calculate the confidence of the uploaded bins, by also |
98 |
incorporating the past history of those users. |
99 |
|
100 |
Sub-note 1: The reason signing is optional, is because ---thanks to |
101 |
information theory--- we don't really need signed packages in order to know |
102 |
that a package is not an outlier. I.e. even unsigned packages can help us |
103 |
figure out the probability of error by simply looking at the redundancy |
104 |
counts. |
105 |
|
106 |
Sub-note 2: But, of course, signing would help as it will allow the central |
107 |
server's statistical analysis to also put into account which bin is coming |
108 |
from which user. E.g. not all users are equally trusty, and this can help |
109 |
the system be more accurate in its prediction of the error on the package. |
110 |
|
111 |
Sub-note 3: I said it already, but just to repeat, when the error becomes |
112 |
low enough, this distributed system can potentially end up producing binaries |
113 |
that match or exceed trusty Gentoo devs. Adding common heuristic checks are |
114 |
optional, but can make the bins even more likely to beat manual devs. |
115 |
|
116 |
* Eventually, this statistical approach could also replace the need for |
117 |
manually electing binary package maintainers by a principled statistical |
118 |
approach. Thanks to the way stuff work in nature, this system has the |
119 |
potential of being even more trusty than the trustier bin-packager developer. |
120 |
|
121 |
* In the future, this could be extended to source-code ebuilds, too. |
122 |
Ultimately, reaching a quality equal to, or exceeding that of, the current |
123 |
manual system. This may pave the path to a much more efficient operating |
124 |
system where less manual labour is needed by the devs, so that more devs can |
125 |
do actually more fun things than packaging boring stuff. |
126 |
|
127 |
* This system will get better the more people use it, and the better it gets |
128 |
the more the people would like it and hence even more will use it! It works |
129 |
like turbo-charging. Hence, if this succeeds, we may market Gentoo as the |
130 |
first "turbo-charged OS"! |
131 |
|
132 |
* Based on step (5), the server can set frequency thresholds in order to keep |
133 |
its resources only utilized by highly demanded packages. |
134 |
|
135 |
|
136 |
rgrds, |
137 |
cm |