Re: [gentoo-dev] [RFC] Forced/automatic USE flag constraints (codename: ENFORCED_USE) - gentoo-dev

From:	"Michał Górny" <mgorny@g.o>
To:	gentoo-dev@l.g.o
Subject:	Re: [gentoo-dev] [RFC] Forced/automatic USE flag constraints (codename: ENFORCED_USE)
Date:	Thu, 01 Jun 2017 21:31:40
Message-Id:	`1496352685.30502.4.camel@gentoo.org`
In Reply to:	Re: [gentoo-dev] [RFC] Forced/automatic USE flag constraints (codename: ENFORCED_USE) by Alexis Ballier

1

On czw, 2017-06-01 at 10:55 +0200, Alexis Ballier wrote:

2

> On Wed, 31 May 2017 21:02:24 +0200

3

> Michał Górny <mgorny@g.o> wrote:

4

>

5

> > On śro, 2017-05-31 at 19:39 +0200, Alexis Ballier wrote:

6

> > > > >  Again, you *need* to process the constraints in order. '!a?

7

> > > > > ( b ) !b? ( a )' is not deterministic when none of a and b are

8

> > > > > enabled otherwise.

9

> > > >

10

> > > > You can't rely on any particular order of constraints, especially

11

> > > > when eclass stacking comes into play. You could try defining some

12

> > > > constraint- sorting algorithm but it's going to be complex and

13

> > > > it's usefulness will be limited anyway due to various kinds of

14

> > > > grouping.

15

> > >

16

> > >

17

> > > Better have some order: If half of the above constraint comes from

18

> > > ebuild and the other half comes from eclass, then PM might toggle

19

> > > 'a' or 'b' depending on the phase of the moon which is specifically

20

> > > what we're trying to avoid.

21

> >

22

> > No, it can't. That's the whole point. The algorithm must be defined so

23

> > that it is always predictable independently of order (maybe except

24

> > the ordering inside ^^, ||, ??) and independently of how it's nested

25

> > (i.e. 'a? ( b? ( c ) )' must give the same result as 'b? ( a? ( c )

26

> > )').

27

>

28

> This is a lot of handwaving without real description of how it would

29

> work. This starts to look a lot like confluence in rewriting systems

30

> and you want developers to write only confluent rules and repoman to

31

> decide that. Good luck with that.

32

33

I'm sorry, what I said was quite stupid. I did some thinking and testing

34

today, and the algorithm can be actually quite trivial. The real issue

35

is getting a correct input, that is REQUIRED_USE constraints that have

36

only one solution.

37

38

That is the whole problem I was signaling, and which I seem to have lost

39

sight of: solving is trivial, ensuring that the constraint can have only

40

one solution isn't. Hopefully, most of the simple constraints in use

41

will 'just work'.

42

43

The biggest issue for me right now is to find a way to determine whether

44

a particular REQUIRED_USE constraint can have only one solution,

45

independently of the order of non-n-ary constraints. However, some of it

46

can be easily eyeball-checked using a graph.

47

48

> Let's look at real world examples:

49

> gtk+:

50

>

51

> REQUIRED_USE="

52

>     || ( aqua wayland X )

53

>     xinerama? ( X )

54

> "

55

56

$ ./solve.py '|| ( aqua wayland X ) xinerama? ( X )'

57

[!wayland? => [!X? => [aqua]], xinerama? => [X]]

58

59

       x         x

60

     w i       w i

61

     a n       a n

62

     y e       y e

63

   a l r     a l r

64

   q a a     q a a

65

   u n m     u n m

66

 X a d a | X a d a

67

 0 0 0 0 | 0 1 0 0

68

 0 0 0 1 | 1 1 0 1

69

 0 0 1 0 | 0 0 1 0 (==)

70

 0 0 1 1 | 1 0 1 1

71

 0 1 0 0 | 0 1 0 0 (==)

72

 0 1 0 1 | 1 1 0 1

73

 0 1 1 0 | 0 1 1 0 (==)

74

 0 1 1 1 | 1 1 1 1

75

 1 0 0 0 | 1 0 0 0 (==)

76

 1 0 0 1 | 1 0 0 1 (==)

77

 1 0 1 0 | 1 0 1 0 (==)

78

 1 0 1 1 | 1 0 1 1 (==)

79

 1 1 0 0 | 1 1 0 0 (==)

80

 1 1 0 1 | 1 1 0 1 (==)

81

 1 1 1 0 | 1 1 1 0 (==)

82

 1 1 1 1 | 1 1 1 1 (==)

83

84

>

85

> Let's assume aqua is masked, which reduces to:

86

> REQUIRED_USE="

87

>     || ( wayland X )

88

>     xinerama? ( X )

89

> "

90

91

$ ./solve.py '|| ( aqua wayland X ) xinerama? ( X )' '!aqua'

92

[!X? => [!aqua? => [wayland]], xinerama? => [X]]

93

94

       x         x

95

     w i       w i

96

     a n       a n

97

     y e       y e

98

   a l r     a l r

99

   q a a     q a a

100

   u n m     u n m

101

 X a d a | X a d a

102

 0 0 0 0 | 0 0 1 0

103

 0 0 0 1 | 1 0 1 1

104

 0 0 1 0 | 0 0 1 0 (==)

105

 0 0 1 1 | 1 0 1 1

106

 1 0 0 0 | 1 0 0 0 (==)

107

 1 0 0 1 | 1 0 0 1 (==)

108

 1 0 1 0 | 1 0 1 0 (==)

109

 1 0 1 1 | 1 0 1 1 (==)

110

111

(to avoid having to explicitly play with empty clauses and such, it

112

doesn't remove masked flags, just shifts them to the end)

113

114

>

115

> With USE='-* xinerama' this one is invalid: applying the first

116

> constraint first enables wayland, then the 2nd enables X, giving a

117

> result of USE="xinerama wayland X". Applying the 2nd first enables X,

118

> then the 1st does nothing, giving a result of USE="xinerama X".

119

120

Indeed. To regain order-indepedence, you could do:

121

122

  xinerama? ( X ) !xinerama? ( || ( aqua wayland X ) )

123

124

While it's non-obvious, it's the only reasonable way to ensure that

125

things don't start falling apart randomly.

126

127

> Now the funny one:

128

> $ portageq metadata / ebuild dev-lang/php-7.0.18 REQUIRED_USE

129

> cli? ( ^^ ( readline libedit ) ) truetype? ( gd ) webp? ( gd ) cjk?

130

> ( gd ) exif? ( gd ) xpm? ( gd ) gd? ( zlib ) simplexml? ( xml ) soap?

131

> ( xml ) wddx? ( xml ) xmlrpc? ( || ( xml iconv ) ) xmlreader? ( xml )

132

> xslt? ( xml ) ldap-sasl? ( ldap ) mhash? ( hash ) phar? ( hash ) qdbm?

133

> ( !gdbm ) readline? ( !libedit ) recode? ( !imap !mysqli ) sharedmem?

134

> ( !threads ) mysql? ( || ( mysqli pdo ) ) || ( cli cgi fpm apache2

135

> embed phpdbg )

136

137

It isn't as scary as it looks. If you graph it, then you can notice it's

138

just got a few separate sub-graphs:

139

140

1. [!cgi, !fpm, !phpdbg, !apache2, !embed] -> cli -> (readline <=> !libedit)

141

142

(the last bit having duplicate 'readline? ( !libedit )' which is wrong)

143

144

2. [truetype, webp, cjk, exif, xpm] -> gd -> zlib

145

146

3. [simplexml, soap, wddx, xmlrpc, !iconv, xmlreader, xslt] -> xml

147

148

4. ldap-sasl -> ldap

149

150

5. [mhash, phar] -> hash

151

152

6. qdbm -> !gdbm

153

154

7. recode -> [!imap, !mysqli]

155

156

   [mysql, !pdo] -> mysqli

157

158

(note that there are both implications for mysqli and !mysqli -- for 4

159

cases they cause convergence error)

160

161

8. sharedmem -> !threads

162

163

All but 1 and 7 are trivial.

164

165

> There are probably dozens of ways to make that non deterministic.

166

> Here's one: USE='-*'. Apply '|| ( cli cgi fpm apache2 embed phpdbg )'

167

> last; this enables 'cli'. Since it's the last one, REQUIRED_USE is not

168

> satisfied because of 'cli? ( ^^ ( readline libedit ) )'.

169

> If you process it first, then you enable cli and readline and are good.

170

171

You just take a second iteration, and things fall into place. That's not

172

a problem.

173

174

> > > Also, what happens if we applied all the constraints and obtained

175

> > > some useflags setting that still fails REQUIRED_USE check ?

176

> >

177

> > It can't happen. If you can apply all the constraints, then implicitly

178

> > REQUIRED_USE is satisfied. If you can't apply all the constraints,

179

> > then it just fails. Of course, we want to ultimately avoid that case.

180

>

181

> See the php example above. How do you ensure this does not happen?

182

>

183

> Note that your assertion 'If you can apply all the constraints, then

184

> implicitly REQUIRED_USE is satisfied.' is false: You're changing the

185

> values of useflags, thus might violate some previously checked

186

> constraint.

187

188

So you reiterate. Either you reach a solution (preferably there's only

189

a single valid solution you can reach) or you hit a previous state which

190

indicates a loop and you fail.

191

192

> > > > The point would be: by definition, the solver should be able to

193

> > > > touch *any* USE flag. If it can't, it mostly implies that you

194

> > > > can't use the automatic solver, and so the case is irrelevant for

195

> > > > checking. Attempting to go beyond that is going to give a lot of

196

> > > > complexity and most likely kill the whole idea.

197

> > > >

198

> > > > One thing we need to worry about are masks. We need to think about

199

> > > > them.

200

> > >

201

> > > It makes zero difference for any solver if you replace variables

202

> > > (useflags) by 'true' or 'false' constants (masked/forced/user-forced

203

> > > useflags). It's even simpler actually. Whether the formula is not

204

> > > constantly 'true' (ie REQUIRED_USE is useless) or constantly

205

> > > 'false' (ie there's no way to solve it) is equivalent to solving

206

> > > SAT. We likely don't want that for repoman running on php.

207

> > >

208

> >

209

> > Well, probably yes. We just need to make sure to apply them correctly

210

> > in different contexts, to avoid accidentally skipping some

211

> > constraints. I think it would be reasonably to assume that:

212

>

213

> [...]

214

>

215

> Yes, you can eliminate constants. It is probably even confluent, i.e.

216

> the order in which you eliminate them does not matter and it always

217

> converges to the same result.

218

>

219

220

As mentioned above, I've chosen an even simpler path -- I just push true

221

parts up front, and false to back. This handles all the cases without

222

extra checking logic -- if immutables satisfy the constraint, it's

223

satisfied early; if they make it impossible to satisfy it, they just

224

make it fail at trying to touch an immutable flag.

225

226

$ ./solve.py "^^ ( c b a )" 'a b'

227

[!a? => [!c? => [b]], b? => [!a, !c], a? => [!c]]

228

229

 a b c | a b c

230

 1 1 0 | [unsolvable: immutable a mismatched]

231

 1 1 1 | [unsolvable: immutable a mismatched]

232

233

My current code is on github [1]. It's ugly, slow and incomplete. It's

234

merely a proof-of-concept and testing toy but still could give some

235

clues.

236

237

[1]:https://github.com/mgorny/required-use

238

239

--

240

Best regards,

241

Michał Górny

Gentoo Archives: gentoo-dev

Attachments

Replies

1	On czw, 2017-06-01 at 10:55 +0200, Alexis Ballier wrote:
2	> On Wed, 31 May 2017 21:02:24 +0200
3	> Michał Górny <mgorny@g.o> wrote:
4	>
5	> > On śro, 2017-05-31 at 19:39 +0200, Alexis Ballier wrote:
6	> > > > > Again, you need to process the constraints in order. '!a?
7	> > > > > ( b ) !b? ( a )' is not deterministic when none of a and b are
8	> > > > > enabled otherwise.
9	> > > >
10	> > > > You can't rely on any particular order of constraints, especially
11	> > > > when eclass stacking comes into play. You could try defining some
12	> > > > constraint- sorting algorithm but it's going to be complex and
13	> > > > it's usefulness will be limited anyway due to various kinds of
14	> > > > grouping.
15	> > >
16	> > >
17	> > > Better have some order: If half of the above constraint comes from
18	> > > ebuild and the other half comes from eclass, then PM might toggle
19	> > > 'a' or 'b' depending on the phase of the moon which is specifically
20	> > > what we're trying to avoid.
21	> >
22	> > No, it can't. That's the whole point. The algorithm must be defined so
23	> > that it is always predictable independently of order (maybe except
24	> > the ordering inside ^^, \|\|, ??) and independently of how it's nested
25	> > (i.e. 'a? ( b? ( c ) )' must give the same result as 'b? ( a? ( c )
26	> > )').
27	>
28	> This is a lot of handwaving without real description of how it would
29	> work. This starts to look a lot like confluence in rewriting systems
30	> and you want developers to write only confluent rules and repoman to
31	> decide that. Good luck with that.
32
33	I'm sorry, what I said was quite stupid. I did some thinking and testing
34	today, and the algorithm can be actually quite trivial. The real issue
35	is getting a correct input, that is REQUIRED_USE constraints that have
36	only one solution.
37
38	That is the whole problem I was signaling, and which I seem to have lost
39	sight of: solving is trivial, ensuring that the constraint can have only
40	one solution isn't. Hopefully, most of the simple constraints in use
41	will 'just work'.
42
43	The biggest issue for me right now is to find a way to determine whether
44	a particular REQUIRED_USE constraint can have only one solution,
45	independently of the order of non-n-ary constraints. However, some of it
46	can be easily eyeball-checked using a graph.
47
48	> Let's look at real world examples:
49	> gtk+:
50	>
51	> REQUIRED_USE="
52	> \|\| ( aqua wayland X )
53	> xinerama? ( X )
54	> "
55
56	$ ./solve.py '\|\| ( aqua wayland X ) xinerama? ( X )'
57	[!wayland? => [!X? => [aqua]], xinerama? => [X]]
58
59	x x
60	w i w i
61	a n a n
62	y e y e
63	a l r a l r
64	q a a q a a
65	u n m u n m
66	X a d a \| X a d a
67	0 0 0 0 \| 0 1 0 0
68	0 0 0 1 \| 1 1 0 1
69	0 0 1 0 \| 0 0 1 0 (==)
70	0 0 1 1 \| 1 0 1 1
71	0 1 0 0 \| 0 1 0 0 (==)
72	0 1 0 1 \| 1 1 0 1
73	0 1 1 0 \| 0 1 1 0 (==)
74	0 1 1 1 \| 1 1 1 1
75	1 0 0 0 \| 1 0 0 0 (==)
76	1 0 0 1 \| 1 0 0 1 (==)
77	1 0 1 0 \| 1 0 1 0 (==)
78	1 0 1 1 \| 1 0 1 1 (==)
79	1 1 0 0 \| 1 1 0 0 (==)
80	1 1 0 1 \| 1 1 0 1 (==)
81	1 1 1 0 \| 1 1 1 0 (==)
82	1 1 1 1 \| 1 1 1 1 (==)
83
84	>
85	> Let's assume aqua is masked, which reduces to:
86	> REQUIRED_USE="
87	> \|\| ( wayland X )
88	> xinerama? ( X )
89	> "
90
91	$ ./solve.py '\|\| ( aqua wayland X ) xinerama? ( X )' '!aqua'
92	[!X? => [!aqua? => [wayland]], xinerama? => [X]]
93
94	x x
95	w i w i
96	a n a n
97	y e y e
98	a l r a l r
99	q a a q a a
100	u n m u n m
101	X a d a \| X a d a
102	0 0 0 0 \| 0 0 1 0
103	0 0 0 1 \| 1 0 1 1
104	0 0 1 0 \| 0 0 1 0 (==)
105	0 0 1 1 \| 1 0 1 1
106	1 0 0 0 \| 1 0 0 0 (==)
107	1 0 0 1 \| 1 0 0 1 (==)
108	1 0 1 0 \| 1 0 1 0 (==)
109	1 0 1 1 \| 1 0 1 1 (==)
110
111	(to avoid having to explicitly play with empty clauses and such, it
112	doesn't remove masked flags, just shifts them to the end)
113
114	>
115	> With USE='-* xinerama' this one is invalid: applying the first
116	> constraint first enables wayland, then the 2nd enables X, giving a
117	> result of USE="xinerama wayland X". Applying the 2nd first enables X,
118	> then the 1st does nothing, giving a result of USE="xinerama X".
119
120	Indeed. To regain order-indepedence, you could do:
121
122	xinerama? ( X ) !xinerama? ( \|\| ( aqua wayland X ) )
123
124	While it's non-obvious, it's the only reasonable way to ensure that
125	things don't start falling apart randomly.
126
127	> Now the funny one:
128	> $ portageq metadata / ebuild dev-lang/php-7.0.18 REQUIRED_USE
129	> cli? ( ^^ ( readline libedit ) ) truetype? ( gd ) webp? ( gd ) cjk?
130	> ( gd ) exif? ( gd ) xpm? ( gd ) gd? ( zlib ) simplexml? ( xml ) soap?
131	> ( xml ) wddx? ( xml ) xmlrpc? ( \|\| ( xml iconv ) ) xmlreader? ( xml )
132	> xslt? ( xml ) ldap-sasl? ( ldap ) mhash? ( hash ) phar? ( hash ) qdbm?
133	> ( !gdbm ) readline? ( !libedit ) recode? ( !imap !mysqli ) sharedmem?
134	> ( !threads ) mysql? ( \|\| ( mysqli pdo ) ) \|\| ( cli cgi fpm apache2
135	> embed phpdbg )
136
137	It isn't as scary as it looks. If you graph it, then you can notice it's
138	just got a few separate sub-graphs:
139
140	1. [!cgi, !fpm, !phpdbg, !apache2, !embed] -> cli -> (readline <=> !libedit)
141
142	(the last bit having duplicate 'readline? ( !libedit )' which is wrong)
143
144	2. [truetype, webp, cjk, exif, xpm] -> gd -> zlib
145
146	3. [simplexml, soap, wddx, xmlrpc, !iconv, xmlreader, xslt] -> xml
147
148	4. ldap-sasl -> ldap
149
150	5. [mhash, phar] -> hash
151
152	6. qdbm -> !gdbm
153
154	7. recode -> [!imap, !mysqli]
155
156	[mysql, !pdo] -> mysqli
157
158	(note that there are both implications for mysqli and !mysqli -- for 4
159	cases they cause convergence error)
160
161	8. sharedmem -> !threads
162
163	All but 1 and 7 are trivial.
164
165	> There are probably dozens of ways to make that non deterministic.
166	> Here's one: USE='-*'. Apply '\|\| ( cli cgi fpm apache2 embed phpdbg )'
167	> last; this enables 'cli'. Since it's the last one, REQUIRED_USE is not
168	> satisfied because of 'cli? ( ^^ ( readline libedit ) )'.
169	> If you process it first, then you enable cli and readline and are good.
170
171	You just take a second iteration, and things fall into place. That's not
172	a problem.
173
174	> > > Also, what happens if we applied all the constraints and obtained
175	> > > some useflags setting that still fails REQUIRED_USE check ?
176	> >
177	> > It can't happen. If you can apply all the constraints, then implicitly
178	> > REQUIRED_USE is satisfied. If you can't apply all the constraints,
179	> > then it just fails. Of course, we want to ultimately avoid that case.
180	>
181	> See the php example above. How do you ensure this does not happen?
182	>
183	> Note that your assertion 'If you can apply all the constraints, then
184	> implicitly REQUIRED_USE is satisfied.' is false: You're changing the
185	> values of useflags, thus might violate some previously checked
186	> constraint.
187
188	So you reiterate. Either you reach a solution (preferably there's only
189	a single valid solution you can reach) or you hit a previous state which
190	indicates a loop and you fail.
191
192	> > > > The point would be: by definition, the solver should be able to
193	> > > > touch any USE flag. If it can't, it mostly implies that you
194	> > > > can't use the automatic solver, and so the case is irrelevant for
195	> > > > checking. Attempting to go beyond that is going to give a lot of
196	> > > > complexity and most likely kill the whole idea.
197	> > > >
198	> > > > One thing we need to worry about are masks. We need to think about
199	> > > > them.
200	> > >
201	> > > It makes zero difference for any solver if you replace variables
202	> > > (useflags) by 'true' or 'false' constants (masked/forced/user-forced
203	> > > useflags). It's even simpler actually. Whether the formula is not
204	> > > constantly 'true' (ie REQUIRED_USE is useless) or constantly
205	> > > 'false' (ie there's no way to solve it) is equivalent to solving
206	> > > SAT. We likely don't want that for repoman running on php.
207	> > >
208	> >
209	> > Well, probably yes. We just need to make sure to apply them correctly
210	> > in different contexts, to avoid accidentally skipping some
211	> > constraints. I think it would be reasonably to assume that:
212	>
213	> [...]
214	>
215	> Yes, you can eliminate constants. It is probably even confluent, i.e.
216	> the order in which you eliminate them does not matter and it always
217	> converges to the same result.
218	>
219
220	As mentioned above, I've chosen an even simpler path -- I just push true
221	parts up front, and false to back. This handles all the cases without
222	extra checking logic -- if immutables satisfy the constraint, it's
223	satisfied early; if they make it impossible to satisfy it, they just
224	make it fail at trying to touch an immutable flag.
225
226	$ ./solve.py "^^ ( c b a )" 'a b'
227	[!a? => [!c? => [b]], b? => [!a, !c], a? => [!c]]
228
229	a b c \| a b c
230	1 1 0 \| [unsolvable: immutable a mismatched]
231	1 1 1 \| [unsolvable: immutable a mismatched]
232
233	My current code is on github [1]. It's ugly, slow and incomplete. It's
234	merely a proof-of-concept and testing toy but still could give some
235	clues.
236
237	[1]:https://github.com/mgorny/required-use
238
239	--
240	Best regards,
241	Michał Górny