[gentoo-user] almost free launch: an idea to lower build time, and rice, at the same time - gentoo-user

From:	Caveman Al Toraboran <toraboracaveman@××××××××××.com>
To:	"gentoo-user@l.g.o" <gentoo-user@l.g.o>
Subject:	[gentoo-user] almost free launch: an idea to lower build time, and rice, at the same time
Date:	Tue, 05 Nov 2019 00:01:54
Message-Id:	`rF3fmNvZLZbm4NQsoKiNyPqxBYY6Tvs1pkGnjGb1NYmvHcIv_nwCD9GKk8oXaF75-8TTnKp9x2AVaFEJ5QRRBZUJ8GaTDWm6bfokeXrXpIw=@protonmail.com`

1

DISCLAIMER:  I am not claiming that this idea is new.  It is probably not new.

2

-----------  Even though some of its details might be new for a Linux

3

             distribution, it's all based on boring well-established bits of

4

             known science.  But regardless of its newness, I think it's worth

5

             sharing with the hope that it may re-kindle the fire in a nerd's

6

             heart (or a group of nerds) so that they develop this for me (or

7

             us).

GOAL:

12

-----

13

Reduce compile time, rice (e.g. fancy USE, make.conf, etc), and yet not

14

increase dev overhead.

15

16

17

CURRENT SITUATION:

18

------------------

19

If you use *-bin packages, you cannot rice, and must compile on your own.

20

21

22

THE APPROACH: 

23

-------------

24

1. Some nerd (or a group of nerds) makes (or make) a package, maybe call it

25

   `almostfreelunch.ebuild`.

26

27

2. Say you want to compile qtwebengine.  You do:   `almostfreelunch -aqvDuNt

28

   --backbrack=1000 qtwebengine`.

29

30

3. The app, `almostfreelunch`, will lookup your build setup (e.g.  USE flags,

31

   make.conf settings, etc) for all packages that you are about to build on

32

   your system as you are about to install that qtwebengine.

33

34

4. The app will upload that info to a central server, which  looks up the

35

   popularity of certain configurations.  E.g. see the distribution of

36

   compile-time configurations for a given package.  The central server will

37

   then figure out things like, qtwebengine is commonly compiled for x86-64

38

   with certain USE flags and other settings in make.conf.

39

40

5. If the server figures out that the package that `almostfreelunch` is about

41

   to compile is popular enough with the specific build settings that is about

42

   to happen, the server will reply to the app and tell it "hi, upload to me

43

   your bins when cooked, plz".  But if the build setting is not popular

44

   enough, it will reply "nothx".  This way, the central server will not end up

45

   with too much undesired binaries with uncommon build-time settings.

46

47

6. The central server will also collect multiple binary packages from multiple

48

   people who use `almostfreelunch` for the same packages and the same

49

   build-time options.  I.e. multiple qtwebengine with identical build-time

50

   settings (e.g.  same USE flags, make.conf, etc).

51

52

7. The central server will perform statistical analysis against all of the

53

   uploaded binaries, of the same packages and the same claimed build-time

54

   settings, to cross-check those binaries to obtain a statistical confidence

55

   in identifying which of the binaries is the good one, and which ones are

56

   outliers outlier.  Outliers might exist because of users with buggy

57

   compilers, or malicious users that intentionally try to inject malware/bugs

58

   into their binaries.

59

60

8. Thanks to information theory, we will be able to figure out how much

61

   redundancy is needed in order to numerically calculate confidence value that

62

   shows how trusty a given binary is.  E.g. if a package, with specific

63

   build-time options, as a very large number of binary submissions that are

64

   also extremely similar (i.e. only differ in trivial aspects due to certain

65

   randomness in how compilers work), then the central server can calculate a

66

   high confidence value for it.  Else, the confidence value drops.

67

68

9. If a user invokes `almostfreelunch -aqvDuNt --backbrack=1000 qtwebengine`

69

   and the central server tells the user that there is an already compiled

70

   package with the same settings, then the server simply tells the user, and

71

   shows him the confidence associated with the fitness of the binary (based on

72

   calculations in stepss (6) to (8)).  By default, bins with too-low

73

   confidence values will be masked and proper colours will be used to

74

   adequately scare the users from low-confidence packages.

75

76

10. If at step (9) the user likes the confidence of the pre-compiled binary

77

   package, the user can simply download the binary package, blazing fast, with

78

   all the nice UES and make.conf flags that he has.  Else, the user is free to

79

   compile his own version, and upload his own binary, to help the server

80

   enhance its confidence as calculated in steps (6) to (8).

81

82

83

NOTES:

84

------

85

* The statistical analysis in step (5) can also consider the compile time of

86

  packages.  So the minimum popularity required for a specific package build is

87

  weighted while considering the total build time.  This way, too slow-to-build

88

  packages will end up getting a lower minimum popularity than those small

89

  packages.  Choosing the sweet-spot trade-off is a matter of optimizing

90

  resources of the central server.

91

92

* The statistical analysis in steps (6) to (8) could also be further enhanced

93

  by ranking individual users who upload the binaries.  Users, who upload bins,

94

  could optionally also sign their packages, and henceforth be identified by

95

  the central server.  Eventually, statistics can be used to also calculate a

96

  confidence measure on how trusty a user is.  This can eventually help the

97

  server more accurately calculate the confidence of the uploaded bins, by also

98

  incorporating the past history of those users.

99

100

  Sub-note 1:  The reason signing is optional, is because ---thanks to

101

  information theory--- we don't really need signed packages in order to know

102

  that a package is not an outlier.  I.e. even unsigned packages can help us

103

  figure out the probability of error by simply looking at the redundancy

104

  counts.

105

106

  Sub-note 2:  But, of course, signing would help as it will allow the central

107

  server's statistical analysis to also put into account which bin is coming

108

  from which user.  E.g. not all users are equally trusty, and this can help

109

  the system be more accurate in its prediction of the error on the package.

110

111

  Sub-note 3:  I said it already, but just to repeat, when the error becomes

112

  low enough, this distributed system can potentially end up producing binaries

113

  that match or exceed trusty Gentoo devs.  Adding common heuristic checks are

114

  optional, but can make the bins even more likely to beat manual devs.

115

116

* Eventually, this statistical approach could also replace the need for

117

  manually electing binary package maintainers by a principled statistical

118

  approach.  Thanks to the way stuff work in nature, this system has the

119

  potential of being even more trusty than the trustier bin-packager developer.

120

121

* In the future, this could be extended to source-code ebuilds, too.

122

  Ultimately, reaching a quality equal to, or exceeding that of, the current

123

  manual system.  This may pave the path to a much more efficient operating

124

  system where less manual labour is needed by the devs, so that more devs can

125

  do actually more fun things than packaging boring stuff.

126

127

* This system will get better the more people use it, and the better it gets

128

  the more the people would like it and hence even more will use it!  It works

129

  like turbo-charging.  Hence, if this succeeds, we may market Gentoo as the

130

  first "turbo-charged OS"!

131

132

* Based on step (5), the server can set frequency thresholds in order to keep

133

  its resources only utilized by highly demanded packages.

134

135

136

rgrds,

137

cm

Gentoo Archives: gentoo-user

Replies

1	DISCLAIMER: I am not claiming that this idea is new. It is probably not new.
2	----------- Even though some of its details might be new for a Linux
3	distribution, it's all based on boring well-established bits of
4	known science. But regardless of its newness, I think it's worth
5	sharing with the hope that it may re-kindle the fire in a nerd's
6	heart (or a group of nerds) so that they develop this for me (or
7	us).
8
9
10
11	GOAL:
12	-----
13	Reduce compile time, rice (e.g. fancy USE, make.conf, etc), and yet not
14	increase dev overhead.
15
16
17	CURRENT SITUATION:
18	------------------
19	If you use *-bin packages, you cannot rice, and must compile on your own.
20
21
22	THE APPROACH:
23	-------------
24	1. Some nerd (or a group of nerds) makes (or make) a package, maybe call it
25	`almostfreelunch.ebuild`.
26
27	2. Say you want to compile qtwebengine. You do: `almostfreelunch -aqvDuNt
28	--backbrack=1000 qtwebengine`.
29
30	3. The app, `almostfreelunch`, will lookup your build setup (e.g. USE flags,
31	make.conf settings, etc) for all packages that you are about to build on
32	your system as you are about to install that qtwebengine.
33
34	4. The app will upload that info to a central server, which looks up the
35	popularity of certain configurations. E.g. see the distribution of
36	compile-time configurations for a given package. The central server will
37	then figure out things like, qtwebengine is commonly compiled for x86-64
38	with certain USE flags and other settings in make.conf.
39
40	5. If the server figures out that the package that `almostfreelunch` is about
41	to compile is popular enough with the specific build settings that is about
42	to happen, the server will reply to the app and tell it "hi, upload to me
43	your bins when cooked, plz". But if the build setting is not popular
44	enough, it will reply "nothx". This way, the central server will not end up
45	with too much undesired binaries with uncommon build-time settings.
46
47	6. The central server will also collect multiple binary packages from multiple
48	people who use `almostfreelunch` for the same packages and the same
49	build-time options. I.e. multiple qtwebengine with identical build-time
50	settings (e.g. same USE flags, make.conf, etc).
51
52	7. The central server will perform statistical analysis against all of the
53	uploaded binaries, of the same packages and the same claimed build-time
54	settings, to cross-check those binaries to obtain a statistical confidence
55	in identifying which of the binaries is the good one, and which ones are
56	outliers outlier. Outliers might exist because of users with buggy
57	compilers, or malicious users that intentionally try to inject malware/bugs
58	into their binaries.
59
60	8. Thanks to information theory, we will be able to figure out how much
61	redundancy is needed in order to numerically calculate confidence value that
62	shows how trusty a given binary is. E.g. if a package, with specific
63	build-time options, as a very large number of binary submissions that are
64	also extremely similar (i.e. only differ in trivial aspects due to certain
65	randomness in how compilers work), then the central server can calculate a
66	high confidence value for it. Else, the confidence value drops.
67
68	9. If a user invokes `almostfreelunch -aqvDuNt --backbrack=1000 qtwebengine`
69	and the central server tells the user that there is an already compiled
70	package with the same settings, then the server simply tells the user, and
71	shows him the confidence associated with the fitness of the binary (based on
72	calculations in stepss (6) to (8)). By default, bins with too-low
73	confidence values will be masked and proper colours will be used to
74	adequately scare the users from low-confidence packages.
75
76	10. If at step (9) the user likes the confidence of the pre-compiled binary
77	package, the user can simply download the binary package, blazing fast, with
78	all the nice UES and make.conf flags that he has. Else, the user is free to
79	compile his own version, and upload his own binary, to help the server
80	enhance its confidence as calculated in steps (6) to (8).
81
82
83	NOTES:
84	------
85	* The statistical analysis in step (5) can also consider the compile time of
86	packages. So the minimum popularity required for a specific package build is
87	weighted while considering the total build time. This way, too slow-to-build
88	packages will end up getting a lower minimum popularity than those small
89	packages. Choosing the sweet-spot trade-off is a matter of optimizing
90	resources of the central server.
91
92	* The statistical analysis in steps (6) to (8) could also be further enhanced
93	by ranking individual users who upload the binaries. Users, who upload bins,
94	could optionally also sign their packages, and henceforth be identified by
95	the central server. Eventually, statistics can be used to also calculate a
96	confidence measure on how trusty a user is. This can eventually help the
97	server more accurately calculate the confidence of the uploaded bins, by also
98	incorporating the past history of those users.
99
100	Sub-note 1: The reason signing is optional, is because ---thanks to
101	information theory--- we don't really need signed packages in order to know
102	that a package is not an outlier. I.e. even unsigned packages can help us
103	figure out the probability of error by simply looking at the redundancy
104	counts.
105
106	Sub-note 2: But, of course, signing would help as it will allow the central
107	server's statistical analysis to also put into account which bin is coming
108	from which user. E.g. not all users are equally trusty, and this can help
109	the system be more accurate in its prediction of the error on the package.
110
111	Sub-note 3: I said it already, but just to repeat, when the error becomes
112	low enough, this distributed system can potentially end up producing binaries
113	that match or exceed trusty Gentoo devs. Adding common heuristic checks are
114	optional, but can make the bins even more likely to beat manual devs.
115
116	* Eventually, this statistical approach could also replace the need for
117	manually electing binary package maintainers by a principled statistical
118	approach. Thanks to the way stuff work in nature, this system has the
119	potential of being even more trusty than the trustier bin-packager developer.
120
121	* In the future, this could be extended to source-code ebuilds, too.
122	Ultimately, reaching a quality equal to, or exceeding that of, the current
123	manual system. This may pave the path to a much more efficient operating
124	system where less manual labour is needed by the devs, so that more devs can
125	do actually more fun things than packaging boring stuff.
126
127	* This system will get better the more people use it, and the better it gets
128	the more the people would like it and hence even more will use it! It works
129	like turbo-charging. Hence, if this succeeds, we may market Gentoo as the
130	first "turbo-charged OS"!
131
132	* Based on step (5), the server can set frequency thresholds in order to keep
133	its resources only utilized by highly demanded packages.
134
135
136	rgrds,
137	cm