Gentoo Archives: gentoo-user

From: Caveman Al Toraboran <toraboracaveman@××××××××××.com>
To: "gentoo-user@l.g.o" <gentoo-user@l.g.o>
Subject: [gentoo-user] almost free launch: an idea to lower build time, and rice, at the same time
Date: Tue, 05 Nov 2019 00:01:54
Message-Id: rF3fmNvZLZbm4NQsoKiNyPqxBYY6Tvs1pkGnjGb1NYmvHcIv_nwCD9GKk8oXaF75-8TTnKp9x2AVaFEJ5QRRBZUJ8GaTDWm6bfokeXrXpIw=@protonmail.com
DISCLAIMER:  I am not claiming that this idea is new.  It is probably not new.
-----------  Even though some of its details might be new for a Linux
             distribution, it's all based on boring well-established bits of
             known science.  But regardless of its newness, I think it's worth
             sharing with the hope that it may re-kindle the fire in a nerd's
             heart (or a group of nerds) so that they develop this for me (or
             us).



GOAL:
-----
Reduce compile time, rice (e.g. fancy USE, make.conf, etc), and yet not
increase dev overhead.


CURRENT SITUATION:
------------------
If you use *-bin packages, you cannot rice, and must compile on your own.


THE APPROACH: 
-------------
1. Some nerd (or a group of nerds) makes (or make) a package, maybe call it
   `almostfreelunch.ebuild`.

2. Say you want to compile qtwebengine.  You do:   `almostfreelunch -aqvDuNt
   --backbrack=1000 qtwebengine`.

3. The app, `almostfreelunch`, will lookup your build setup (e.g.  USE flags,
   make.conf settings, etc) for all packages that you are about to build on
   your system as you are about to install that qtwebengine.

4. The app will upload that info to a central server, which  looks up the
   popularity of certain configurations.  E.g. see the distribution of
   compile-time configurations for a given package.  The central server will
   then figure out things like, qtwebengine is commonly compiled for x86-64
   with certain USE flags and other settings in make.conf.

5. If the server figures out that the package that `almostfreelunch` is about
   to compile is popular enough with the specific build settings that is about
   to happen, the server will reply to the app and tell it "hi, upload to me
   your bins when cooked, plz".  But if the build setting is not popular
   enough, it will reply "nothx".  This way, the central server will not end up
   with too much undesired binaries with uncommon build-time settings.

6. The central server will also collect multiple binary packages from multiple
   people who use `almostfreelunch` for the same packages and the same
   build-time options.  I.e. multiple qtwebengine with identical build-time
   settings (e.g.  same USE flags, make.conf, etc).

7. The central server will perform statistical analysis against all of the
   uploaded binaries, of the same packages and the same claimed build-time
   settings, to cross-check those binaries to obtain a statistical confidence
   in identifying which of the binaries is the good one, and which ones are
   outliers outlier.  Outliers might exist because of users with buggy
   compilers, or malicious users that intentionally try to inject malware/bugs
   into their binaries.

8. Thanks to information theory, we will be able to figure out how much
   redundancy is needed in order to numerically calculate confidence value that
   shows how trusty a given binary is.  E.g. if a package, with specific
   build-time options, as a very large number of binary submissions that are
   also extremely similar (i.e. only differ in trivial aspects due to certain
   randomness in how compilers work), then the central server can calculate a
   high confidence value for it.  Else, the confidence value drops.

9. If a user invokes `almostfreelunch -aqvDuNt --backbrack=1000 qtwebengine`
   and the central server tells the user that there is an already compiled
   package with the same settings, then the server simply tells the user, and
   shows him the confidence associated with the fitness of the binary (based on
   calculations in stepss (6) to (8)).  By default, bins with too-low
   confidence values will be masked and proper colours will be used to
   adequately scare the users from low-confidence packages.

10. If at step (9) the user likes the confidence of the pre-compiled binary
   package, the user can simply download the binary package, blazing fast, with
   all the nice UES and make.conf flags that he has.  Else, the user is free to
   compile his own version, and upload his own binary, to help the server
   enhance its confidence as calculated in steps (6) to (8).


NOTES:
------
* The statistical analysis in step (5) can also consider the compile time of
  packages.  So the minimum popularity required for a specific package build is
  weighted while considering the total build time.  This way, too slow-to-build
  packages will end up getting a lower minimum popularity than those small
  packages.  Choosing the sweet-spot trade-off is a matter of optimizing
  resources of the central server.

* The statistical analysis in steps (6) to (8) could also be further enhanced
  by ranking individual users who upload the binaries.  Users, who upload bins,
  could optionally also sign their packages, and henceforth be identified by
  the central server.  Eventually, statistics can be used to also calculate a
  confidence measure on how trusty a user is.  This can eventually help the
  server more accurately calculate the confidence of the uploaded bins, by also
  incorporating the past history of those users.

  Sub-note 1:  The reason signing is optional, is because ---thanks to
  information theory--- we don't really need signed packages in order to know
  that a package is not an outlier.  I.e. even unsigned packages can help us
  figure out the probability of error by simply looking at the redundancy
  counts.

  Sub-note 2:  But, of course, signing would help as it will allow the central
  server's statistical analysis to also put into account which bin is coming
  from which user.  E.g. not all users are equally trusty, and this can help
  the system be more accurate in its prediction of the error on the package.

  Sub-note 3:  I said it already, but just to repeat, when the error becomes
  low enough, this distributed system can potentially end up producing binaries
  that match or exceed trusty Gentoo devs.  Adding common heuristic checks are
  optional, but can make the bins even more likely to beat manual devs.

* Eventually, this statistical approach could also replace the need for
  manually electing binary package maintainers by a principled statistical
  approach.  Thanks to the way stuff work in nature, this system has the
  potential of being even more trusty than the trustier bin-packager developer.

* In the future, this could be extended to source-code ebuilds, too.
  Ultimately, reaching a quality equal to, or exceeding that of, the current
  manual system.  This may pave the path to a much more efficient operating
  system where less manual labour is needed by the devs, so that more devs can
  do actually more fun things than packaging boring stuff.

* This system will get better the more people use it, and the better it gets
  the more the people would like it and hence even more will use it!  It works
  like turbo-charging.  Hence, if this succeeds, we may market Gentoo as the
  first "turbo-charged OS"!

* Based on step (5), the server can set frequency thresholds in order to keep
  its resources only utilized by highly demanded packages.


rgrds,
cm

Replies