Re: [gentoo-user] almost free launch: an idea to lower build time, and rice, at the same time - gentoo-user

From:	"Mickaël Bucas" <mbucas@×××××.com>
To:	Gentoo <gentoo-user@l.g.o>
Subject:	Re: [gentoo-user] almost free launch: an idea to lower build time, and rice, at the same time
Date:	Tue, 05 Nov 2019 15:05:22
Message-Id:	`CAG1=SYT-JX24FKeM0b+udW94SJ4_YZEa9_nOKfn9_Uw9g8ZK8Q@mail.gmail.com`
In Reply to:	[gentoo-user] almost free launch: an idea to lower build time, and rice, at the same time by Caveman Al Toraboran

1

Le mar. 5 nov. 2019 à 01:02, Caveman Al Toraboran

2

<toraboracaveman@××××××××××.com> a écrit :

3

>

4

>

5

> DISCLAIMER:  I am not claiming that this idea is new.  It is probably not new.

6

> -----------  Even though some of its details might be new for a Linux

7

>              distribution, it's all based on boring well-established bits of

8

>              known science.  But regardless of its newness, I think it's worth

9

>              sharing with the hope that it may re-kindle the fire in a nerd's

10

>              heart (or a group of nerds) so that they develop this for me (or

11

>              us).

12

>

13

>

14

>

15

> GOAL:

16

> -----

17

> Reduce compile time, rice (e.g. fancy USE, make.conf, etc), and yet not

18

> increase dev overhead.

19

>

20

>

21

> CURRENT SITUATION:

22

> ------------------

23

> If you use *-bin packages, you cannot rice, and must compile on your own.

24

>

25

>

26

> THE APPROACH:

27

> -------------

28

> 1. Some nerd (or a group of nerds) makes (or make) a package, maybe call it

29

>    `almostfreelunch.ebuild`.

30

>

31

> 2. Say you want to compile qtwebengine.  You do:   `almostfreelunch -aqvDuNt

32

>    --backbrack=1000 qtwebengine`.

33

>

34

> 3. The app, `almostfreelunch`, will lookup your build setup (e.g.  USE flags,

35

>    make.conf settings, etc) for all packages that you are about to build on

36

>    your system as you are about to install that qtwebengine.

37

>

38

> 4. The app will upload that info to a central server, which  looks up the

39

>    popularity of certain configurations.  E.g. see the distribution of

40

>    compile-time configurations for a given package.  The central server will

41

>    then figure out things like, qtwebengine is commonly compiled for x86-64

42

>    with certain USE flags and other settings in make.conf.

43

>

44

> 5. If the server figures out that the package that `almostfreelunch` is about

45

>    to compile is popular enough with the specific build settings that is about

46

>    to happen, the server will reply to the app and tell it "hi, upload to me

47

>    your bins when cooked, plz".  But if the build setting is not popular

48

>    enough, it will reply "nothx".  This way, the central server will not end up

49

>    with too much undesired binaries with uncommon build-time settings.

50

>

51

> 6. The central server will also collect multiple binary packages from multiple

52

>    people who use `almostfreelunch` for the same packages and the same

53

>    build-time options.  I.e. multiple qtwebengine with identical build-time

54

>    settings (e.g.  same USE flags, make.conf, etc).

55

>

56

> 7. The central server will perform statistical analysis against all of the

57

>    uploaded binaries, of the same packages and the same claimed build-time

58

>    settings, to cross-check those binaries to obtain a statistical confidence

59

>    in identifying which of the binaries is the good one, and which ones are

60

>    outliers outlier.  Outliers might exist because of users with buggy

61

>    compilers, or malicious users that intentionally try to inject malware/bugs

62

>    into their binaries.

63

>

64

> 8. Thanks to information theory, we will be able to figure out how much

65

>    redundancy is needed in order to numerically calculate confidence value that

66

>    shows how trusty a given binary is.  E.g. if a package, with specific

67

>    build-time options, as a very large number of binary submissions that are

68

>    also extremely similar (i.e. only differ in trivial aspects due to certain

69

>    randomness in how compilers work), then the central server can calculate a

70

>    high confidence value for it.  Else, the confidence value drops.

71

>

72

> 9. If a user invokes `almostfreelunch -aqvDuNt --backbrack=1000 qtwebengine`

73

>    and the central server tells the user that there is an already compiled

74

>    package with the same settings, then the server simply tells the user, and

75

>    shows him the confidence associated with the fitness of the binary (based on

76

>    calculations in stepss (6) to (8)).  By default, bins with too-low

77

>    confidence values will be masked and proper colours will be used to

78

>    adequately scare the users from low-confidence packages.

79

>

80

> 10. If at step (9) the user likes the confidence of the pre-compiled binary

81

>    package, the user can simply download the binary package, blazing fast, with

82

>    all the nice UES and make.conf flags that he has.  Else, the user is free to

83

>    compile his own version, and upload his own binary, to help the server

84

>    enhance its confidence as calculated in steps (6) to (8).

85

>

86

>

87

> NOTES:

88

> ------

89

> * The statistical analysis in step (5) can also consider the compile time of

90

>   packages.  So the minimum popularity required for a specific package build is

91

>   weighted while considering the total build time.  This way, too slow-to-build

92

>   packages will end up getting a lower minimum popularity than those small

93

>   packages.  Choosing the sweet-spot trade-off is a matter of optimizing

94

>   resources of the central server.

95

>

96

> * The statistical analysis in steps (6) to (8) could also be further enhanced

97

>   by ranking individual users who upload the binaries.  Users, who upload bins,

98

>   could optionally also sign their packages, and henceforth be identified by

99

>   the central server.  Eventually, statistics can be used to also calculate a

100

>   confidence measure on how trusty a user is.  This can eventually help the

101

>   server more accurately calculate the confidence of the uploaded bins, by also

102

>   incorporating the past history of those users.

103

>

104

>   Sub-note 1:  The reason signing is optional, is because ---thanks to

105

>   information theory--- we don't really need signed packages in order to know

106

>   that a package is not an outlier.  I.e. even unsigned packages can help us

107

>   figure out the probability of error by simply looking at the redundancy

108

>   counts.

109

>

110

>   Sub-note 2:  But, of course, signing would help as it will allow the central

111

>   server's statistical analysis to also put into account which bin is coming

112

>   from which user.  E.g. not all users are equally trusty, and this can help

113

>   the system be more accurate in its prediction of the error on the package.

114

>

115

>   Sub-note 3:  I said it already, but just to repeat, when the error becomes

116

>   low enough, this distributed system can potentially end up producing binaries

117

>   that match or exceed trusty Gentoo devs.  Adding common heuristic checks are

118

>   optional, but can make the bins even more likely to beat manual devs.

119

>

120

> * Eventually, this statistical approach could also replace the need for

121

>   manually electing binary package maintainers by a principled statistical

122

>   approach.  Thanks to the way stuff work in nature, this system has the

123

>   potential of being even more trusty than the trustier bin-packager developer.

124

>

125

> * In the future, this could be extended to source-code ebuilds, too.

126

>   Ultimately, reaching a quality equal to, or exceeding that of, the current

127

>   manual system.  This may pave the path to a much more efficient operating

128

>   system where less manual labour is needed by the devs, so that more devs can

129

>   do actually more fun things than packaging boring stuff.

130

>

131

> * This system will get better the more people use it, and the better it gets

132

>   the more the people would like it and hence even more will use it!  It works

133

>   like turbo-charging.  Hence, if this succeeds, we may market Gentoo as the

134

>   first "turbo-charged OS"!

135

>

136

> * Based on step (5), the server can set frequency thresholds in order to keep

137

>   its resources only utilized by highly demanded packages.

138

>

139

>

140

> rgrds,

141

> cm

142

143

Hi Caveman

144

145

The Portage tree contains a few binary packages prepared by Gentoo

146

developers, like Firefox, Rust, LibreOffice...

147

"ls -d /usr/portage/*/*-bin" shows about 90 packages prepared in this

148

way, some of them because they are non-free like Oracle JDK

149

150

This means that there is no necessary changes to Gentoo to accomplish

151

what you describe : compile the packages, write the ebuilds for the

152

binary packages, publish ebuilds in an overlay.

153

154

But the really short list above shows that it's a really complex task

155

because of all dependencies and configurable elements in Gentoo. If

156

you just have a look at the output of "emerge --info" you can imagine

157

all the moving parts, like compiler versions and compile options,

158

Bash, Perl, Python, Init system, USE flags (combinatorial), even human

159

languages. And that is just the easily visible parts !

160

161

I remember reading an article about a man trying to reproduce binary

162

packages of a binary distribution and failing to do so, because there

163

are so many parts involved. I've read later that distributions have

164

done some work to have reproducible builds, but I'm not sure how

165

successful they are, even when all choices are predefined.

166

167

Given that Gentoo has taken a whole different road by having more

168

choices available to the user, I don't think the compilation results

169

of one configuration would be easily used on another.

170

171

To go even further, pushing your compiled packages to a public server

172

may create a security risk by exposing many parts of your

173

configuration that could be analyzed by malicious people.

174

175

So far I don't see a really big advantage in building this kind of

176

infrastructure compared to either a binary distribution or Gentoo with

177

home compilation.

178

179

Best regards

180

181

Mickaël Bucas

Gentoo Archives: gentoo-user

Replies

1	Le mar. 5 nov. 2019 à 01:02, Caveman Al Toraboran
2	<toraboracaveman@××××××××××.com> a écrit :
3	>
4	>
5	> DISCLAIMER: I am not claiming that this idea is new. It is probably not new.
6	> ----------- Even though some of its details might be new for a Linux
7	> distribution, it's all based on boring well-established bits of
8	> known science. But regardless of its newness, I think it's worth
9	> sharing with the hope that it may re-kindle the fire in a nerd's
10	> heart (or a group of nerds) so that they develop this for me (or
11	> us).
12	>
13	>
14	>
15	> GOAL:
16	> -----
17	> Reduce compile time, rice (e.g. fancy USE, make.conf, etc), and yet not
18	> increase dev overhead.
19	>
20	>
21	> CURRENT SITUATION:
22	> ------------------
23	> If you use *-bin packages, you cannot rice, and must compile on your own.
24	>
25	>
26	> THE APPROACH:
27	> -------------
28	> 1. Some nerd (or a group of nerds) makes (or make) a package, maybe call it
29	> `almostfreelunch.ebuild`.
30	>
31	> 2. Say you want to compile qtwebengine. You do: `almostfreelunch -aqvDuNt
32	> --backbrack=1000 qtwebengine`.
33	>
34	> 3. The app, `almostfreelunch`, will lookup your build setup (e.g. USE flags,
35	> make.conf settings, etc) for all packages that you are about to build on
36	> your system as you are about to install that qtwebengine.
37	>
38	> 4. The app will upload that info to a central server, which looks up the
39	> popularity of certain configurations. E.g. see the distribution of
40	> compile-time configurations for a given package. The central server will
41	> then figure out things like, qtwebengine is commonly compiled for x86-64
42	> with certain USE flags and other settings in make.conf.
43	>
44	> 5. If the server figures out that the package that `almostfreelunch` is about
45	> to compile is popular enough with the specific build settings that is about
46	> to happen, the server will reply to the app and tell it "hi, upload to me
47	> your bins when cooked, plz". But if the build setting is not popular
48	> enough, it will reply "nothx". This way, the central server will not end up
49	> with too much undesired binaries with uncommon build-time settings.
50	>
51	> 6. The central server will also collect multiple binary packages from multiple
52	> people who use `almostfreelunch` for the same packages and the same
53	> build-time options. I.e. multiple qtwebengine with identical build-time
54	> settings (e.g. same USE flags, make.conf, etc).
55	>
56	> 7. The central server will perform statistical analysis against all of the
57	> uploaded binaries, of the same packages and the same claimed build-time
58	> settings, to cross-check those binaries to obtain a statistical confidence
59	> in identifying which of the binaries is the good one, and which ones are
60	> outliers outlier. Outliers might exist because of users with buggy
61	> compilers, or malicious users that intentionally try to inject malware/bugs
62	> into their binaries.
63	>
64	> 8. Thanks to information theory, we will be able to figure out how much
65	> redundancy is needed in order to numerically calculate confidence value that
66	> shows how trusty a given binary is. E.g. if a package, with specific
67	> build-time options, as a very large number of binary submissions that are
68	> also extremely similar (i.e. only differ in trivial aspects due to certain
69	> randomness in how compilers work), then the central server can calculate a
70	> high confidence value for it. Else, the confidence value drops.
71	>
72	> 9. If a user invokes `almostfreelunch -aqvDuNt --backbrack=1000 qtwebengine`
73	> and the central server tells the user that there is an already compiled
74	> package with the same settings, then the server simply tells the user, and
75	> shows him the confidence associated with the fitness of the binary (based on
76	> calculations in stepss (6) to (8)). By default, bins with too-low
77	> confidence values will be masked and proper colours will be used to
78	> adequately scare the users from low-confidence packages.
79	>
80	> 10. If at step (9) the user likes the confidence of the pre-compiled binary
81	> package, the user can simply download the binary package, blazing fast, with
82	> all the nice UES and make.conf flags that he has. Else, the user is free to
83	> compile his own version, and upload his own binary, to help the server
84	> enhance its confidence as calculated in steps (6) to (8).
85	>
86	>
87	> NOTES:
88	> ------
89	> * The statistical analysis in step (5) can also consider the compile time of
90	> packages. So the minimum popularity required for a specific package build is
91	> weighted while considering the total build time. This way, too slow-to-build
92	> packages will end up getting a lower minimum popularity than those small
93	> packages. Choosing the sweet-spot trade-off is a matter of optimizing
94	> resources of the central server.
95	>
96	> * The statistical analysis in steps (6) to (8) could also be further enhanced
97	> by ranking individual users who upload the binaries. Users, who upload bins,
98	> could optionally also sign their packages, and henceforth be identified by
99	> the central server. Eventually, statistics can be used to also calculate a
100	> confidence measure on how trusty a user is. This can eventually help the
101	> server more accurately calculate the confidence of the uploaded bins, by also
102	> incorporating the past history of those users.
103	>
104	> Sub-note 1: The reason signing is optional, is because ---thanks to
105	> information theory--- we don't really need signed packages in order to know
106	> that a package is not an outlier. I.e. even unsigned packages can help us
107	> figure out the probability of error by simply looking at the redundancy
108	> counts.
109	>
110	> Sub-note 2: But, of course, signing would help as it will allow the central
111	> server's statistical analysis to also put into account which bin is coming
112	> from which user. E.g. not all users are equally trusty, and this can help
113	> the system be more accurate in its prediction of the error on the package.
114	>
115	> Sub-note 3: I said it already, but just to repeat, when the error becomes
116	> low enough, this distributed system can potentially end up producing binaries
117	> that match or exceed trusty Gentoo devs. Adding common heuristic checks are
118	> optional, but can make the bins even more likely to beat manual devs.
119	>
120	> * Eventually, this statistical approach could also replace the need for
121	> manually electing binary package maintainers by a principled statistical
122	> approach. Thanks to the way stuff work in nature, this system has the
123	> potential of being even more trusty than the trustier bin-packager developer.
124	>
125	> * In the future, this could be extended to source-code ebuilds, too.
126	> Ultimately, reaching a quality equal to, or exceeding that of, the current
127	> manual system. This may pave the path to a much more efficient operating
128	> system where less manual labour is needed by the devs, so that more devs can
129	> do actually more fun things than packaging boring stuff.
130	>
131	> * This system will get better the more people use it, and the better it gets
132	> the more the people would like it and hence even more will use it! It works
133	> like turbo-charging. Hence, if this succeeds, we may market Gentoo as the
134	> first "turbo-charged OS"!
135	>
136	> * Based on step (5), the server can set frequency thresholds in order to keep
137	> its resources only utilized by highly demanded packages.
138	>
139	>
140	> rgrds,
141	> cm
142
143	Hi Caveman
144
145	The Portage tree contains a few binary packages prepared by Gentoo
146	developers, like Firefox, Rust, LibreOffice...
147	"ls -d /usr/portage//-bin" shows about 90 packages prepared in this
148	way, some of them because they are non-free like Oracle JDK
149
150	This means that there is no necessary changes to Gentoo to accomplish
151	what you describe : compile the packages, write the ebuilds for the
152	binary packages, publish ebuilds in an overlay.
153
154	But the really short list above shows that it's a really complex task
155	because of all dependencies and configurable elements in Gentoo. If
156	you just have a look at the output of "emerge --info" you can imagine
157	all the moving parts, like compiler versions and compile options,
158	Bash, Perl, Python, Init system, USE flags (combinatorial), even human
159	languages. And that is just the easily visible parts !
160
161	I remember reading an article about a man trying to reproduce binary
162	packages of a binary distribution and failing to do so, because there
163	are so many parts involved. I've read later that distributions have
164	done some work to have reproducible builds, but I'm not sure how
165	successful they are, even when all choices are predefined.
166
167	Given that Gentoo has taken a whole different road by having more
168	choices available to the user, I don't think the compilation results
169	of one configuration would be easily used on another.
170
171	To go even further, pushing your compiled packages to a public server
172	may create a security risk by exposing many parts of your
173	configuration that could be analyzed by malicious people.
174
175	So far I don't see a really big advantage in building this kind of
176	infrastructure compared to either a binary distribution or Gentoo with
177	home compilation.
178
179	Best regards
180
181	Mickaël Bucas

Subject	Author
Re: [gentoo-user] almost free launch: an idea to lower build time, and rice, at the same time	Caveman Al Toraboran <toraboracaveman@××××××××××.com>
Re: [gentoo-user] almost free launch: an idea to lower build time, and rice, at the same time	Wols Lists <antlists@××××××××××××.uk>