Re: [gentoo-dev] [RFC] Forced/automatic USE flag constraints (codename: ENFORCED_USE) - gentoo-dev

From:	Alexis Ballier <aballier@g.o>
To:	gentoo-dev@l.g.o
Subject:	Re: [gentoo-dev] [RFC] Forced/automatic USE flag constraints (codename: ENFORCED_USE)
Date:	Sat, 03 Jun 2017 11:00:24
Message-Id:	`20170603130000.4f88fb14@gentoo.org`
In Reply to:	Re: [gentoo-dev] [RFC] Forced/automatic USE flag constraints (codename: ENFORCED_USE) by "Michał Górny"

1

On Fri, 02 Jun 2017 15:55:17 +0200

2

Michał Górny <mgorny@g.o> wrote:

3

4

> On pią, 2017-06-02 at 13:27 +0200, Alexis Ballier wrote:

5

> > On Thu, 01 Jun 2017 23:31:25 +0200

6

> > Michał Górny <mgorny@g.o> wrote:

7

> > [...]

8

> > > > There are probably dozens of ways to make that non

9

> > > > deterministic. Here's one: USE='-*'. Apply '|| ( cli cgi fpm

10

> > > > apache2 embed phpdbg )' last; this enables 'cli'. Since it's

11

> > > > the last one, REQUIRED_USE is not satisfied because of 'cli?

12

> > > > ( ^^ ( readline libedit ) )'. If you process it first, then you

13

> > > > enable cli and readline and are good.

14

> > >

15

> > > You just take a second iteration, and things fall into place.

16

> > > That's not a problem.

17

> > >

18

> > > > > > Also, what happens if we applied all the constraints and

19

> > > > > > obtained some useflags setting that still fails REQUIRED_USE

20

> > > > > > check ?

21

> > > > >

22

> > > > > It can't happen. If you can apply all the constraints, then

23

> > > > > implicitly REQUIRED_USE is satisfied. If you can't apply all

24

> > > > > the constraints, then it just fails. Of course, we want to

25

> > > > > ultimately avoid that case.

26

> > > >

27

> > > > See the php example above. How do you ensure this does not

28

> > > > happen?

29

> > > >

30

> > > > Note that your assertion 'If you can apply all the constraints,

31

> > > > then implicitly REQUIRED_USE is satisfied.' is false: You're

32

> > > > changing the values of useflags, thus might violate some

33

> > > > previously checked constraint.

34

> > >

35

> > > So you reiterate. Either you reach a solution (preferably there's

36

> > > only a single valid solution you can reach) or you hit a previous

37

> > > state which indicates a loop and you fail.

38

> >

39

> >

40

> > So, PM, for every ebuild, will need to store the (at most) 2^n

41

> > states it has already seen while trying to solve REQUIRED_USE and

42

> > iterate until it cannot proceed anymore or falls into a previous

43

> > state (so that's 2^n time too). That way we go from a linear time &

44

> > linear space algorithm to one in exponential time & space. That's

45

> > probably not a good idea.

46

>

47

> I don't think that's actually going to happen. You'd have to try

48

> really hard to get over n-1 iterations. I mean, the only simple way I

49

> can think of is:

50

>

51

>   b? ( a ) c? ( b ) d? ( c ) e? ( d ) ...

52

>

53

> and this only means n-1 iterations. Can you think of a better way to

54

> force multiple iterations that can be written without making

55

> REQUIRED_USE completely unreadable? How likely is that going to happen

56

> accidentally?

57

58

I don't see any reason why it should terminate after n iterations; I

59

don't see any example of how to do more either. I can try to think more

60

about it, but maybe it is not even needed, see below.

61

62

> > I think it's better to limit the number of iterations to some

63

> > constant. I'd say 1, then fail if REQUIRED_USE is still not

64

> > satisfied. Maybe 2 or 3 can be useful but I think it'd be much

65

> > harder to provide automated checks for that and much harder for

66

> > ebuild writers to understand what will happen with the REQUIRED_USE

67

> > they're about to write.

68

>

69

> Well, I don't mind setting that. All my tests (that weren't

70

> deliberately abusing the algorithm) were able to finish in a single

71

> iteration. 2 or 3 should probably be safe for all the reasonable

72

> uses. However, if we're not going to verify all possible values on

73

> repoman side, I think it would be better to have a larger limit for

74

> users, to not have things explode on them unnecessarily.

75

76

77

Look, if we assume left to right solving, only one iteration possible

78

and all the constraints to be of the form 'p_1? p_2? ... p_n? ( q_1 ...

79

q_m )' where p_i and q_i are either 'useflag' or '!useflag', then we

80

only have to check when such a clause can change a previously satisfied

81

clause to unsatisfied.

82

83

For two clauses: 

84

"p_1? p_2? ... p_n? ( q_1 ... q_m )" and "p'_1? p'_2? ... p'_{n'}?

85

( q'_1 ... q'_{m'} )", assuming the first is satisfied, when can solving

86

the 2nd break the 1st?

87

88

It must be that:

89

1.The conditions are compatible: No p_i is the negation of a p'_j.

90

2.Solving the 1st does not make the 2nd trivially true: No q_i is

91

  the negation of a p'_j.

92

3.Solving the 2nd does not make the 1st trivially true afterwards: No

93

  p_i is the negation of a q'_j.

94

4.Solving the 2nd does break the 1st assumption: (A q_i is

95

  the negation of a q_'j) or (a q'_j is some p_i and one of q_1 ... q_m

96

  might be false).

97

98

The only a priori non polynomial part here is "one of q_1 ... q_m

99

might be false". If we do not check that (only check that some q'_j is

100

some p_i), we are still guaranteed that the 2nd clause cannot break the

101

1st but we might end up rejecting otherwise valid constructs.

102

103

Doing that, we have a check guaranteeing that the above algorithm will

104

always provide a valid answer for 'clause_1 clause_2' in

105

O(|clause_1|*|clause_2|) time.

106

For a complete REQUIRED_USE='clause_1 ... clause_k', it is sufficient

107

to check that, for any i>j, clause_i cannot break clause_j in the above

108

sense. This gives a polynomial algorithm for being certain that a

109

REQUIRED_USE constraint guarantees the existence of a valid solution

110

after one pass.

111

112

113

We can do even better: Consider a graph where the nodes are the clauses

114

and there is an edge 'a -> b' if 'a' can break 'b' in the above sense.

115

This means 'a' must come left of 'b' in REQUIRED_USE. Hence, repoman

116

can do a topological sort of that graph and suggest a proper reordering

117

of the clauses or, when not possible, exhibit a cycle causing a problem

118

asking the developer to fix it.

Let's run a few examples to see if dropping the condition in point 4.

123

is not too much to ask:

124

"

125

   || ( aqua wayland X )

126

   xinerama? ( X )

127

"

128

Becomes:

129

"

130

  !X? ( !wayland? ( aqua ) )

131

  xinerama? ( X )

132

"

133

Points 1. and 2. above hold, point 3 fails with p_i = !X, q'_j = X. So

134

they're good and the constraint is valid in the above sense.

135

136

137

Another example, simplified form from php:

138

"

139

  cli? ( ^^ ( readline libedit ) ) 

140

  || ( cli cgi )

141

"

142

Becomes:

143

"

144

(a)   cli? ( !libedit? ( readline ) )

145

(b)   cli? ( libedit? ( !readline ) )

146

(c)   !cgi? ( cli )

147

"

148

149

Let's build the graph: No edge between (a) and (b) because they fail

150

point 1. No edge (a) -> (c) nor (b) -> (c): They fail point 4 since

151

"readline" does not appear in (c). There are edges (c) -> (a) and (c) ->

152

(b): Point 1 to 3 hold, point 4 holds because of 'a q'_j is some p_i'

153

with q'_j = p_i = "cli". The ordering is thus invalid because (a) and

154

(b) come before (c) and there are edges going from (c) to both of them.

155

A topological sort will give you something like "(c) (a) (b)" and

156

repoman can suggest you to rewrite that as:

157

"

158

(c)   !cgi? ( cli )

159

(a)   cli? ( !libedit? ( readline ) )

160

(b)   cli? ( libedit? ( !readline ) )

161

"

162

In addition, if we can ensure the topological sort is somewhat stable

163

(*), then it is possible to infer back that (a) (b) come from "cli? ( ^^

164

( readline libedit ) )" and (c) comes from "|| ( cli cgi )" so the

165

repoman warning/error could look like:

166

Hey, your REQUIRED_USE does not allow easy solving in one pass, would

167

you mind using "|| ( cli cgi ) cli? ( ^^ ( readline libedit ) )"

168

instead ?

169

170

(*) Ensuring clauses expanded from the same construct in REQUIRED_USE

171

stay together and thus can be grouped back to their original form. This

172

might not be possible or easy though.

173

174

175

A last example: The mysql/mysqli from php you mentioned in an earlier

176

email here.

177

"

178

(a)   recode? ( !imap !mysqli )

179

(b)   mysql? !pdo? ( mysqli )

180

"

181

There are edges (a) -> (b) and (b) -> (a): Points 1 to 3 hold; Point 4

182

holds because 'A q_i is the negation of a q_'j' with mysqli/!mysqli.

183

The topological sort will fail since there is a cycle and will exhibit

184

this cycle. repoman warning/error could look like:

185

Hey, your constraint does not allow automatic solving, there is a cycle

186

'recode? ( !imap !mysqli )' -> 'mysql? !pdo? ( mysqli )' -> 'recode?

187

( !imap !mysqli )'.

This whole thing definitely needs more thought and feedback but for now

194

those extra restrictions seem quite natural to me, allow easy solving

195

on the PM side and allow to have useful feedback from repoman.

196

197

Alexis.

Gentoo Archives: gentoo-dev

Replies

1	On Fri, 02 Jun 2017 15:55:17 +0200
2	Michał Górny <mgorny@g.o> wrote:
3
4	> On pią, 2017-06-02 at 13:27 +0200, Alexis Ballier wrote:
5	> > On Thu, 01 Jun 2017 23:31:25 +0200
6	> > Michał Górny <mgorny@g.o> wrote:
7	> > [...]
8	> > > > There are probably dozens of ways to make that non
9	> > > > deterministic. Here's one: USE='-*'. Apply '\|\| ( cli cgi fpm
10	> > > > apache2 embed phpdbg )' last; this enables 'cli'. Since it's
11	> > > > the last one, REQUIRED_USE is not satisfied because of 'cli?
12	> > > > ( ^^ ( readline libedit ) )'. If you process it first, then you
13	> > > > enable cli and readline and are good.
14	> > >
15	> > > You just take a second iteration, and things fall into place.
16	> > > That's not a problem.
17	> > >
18	> > > > > > Also, what happens if we applied all the constraints and
19	> > > > > > obtained some useflags setting that still fails REQUIRED_USE
20	> > > > > > check ?
21	> > > > >
22	> > > > > It can't happen. If you can apply all the constraints, then
23	> > > > > implicitly REQUIRED_USE is satisfied. If you can't apply all
24	> > > > > the constraints, then it just fails. Of course, we want to
25	> > > > > ultimately avoid that case.
26	> > > >
27	> > > > See the php example above. How do you ensure this does not
28	> > > > happen?
29	> > > >
30	> > > > Note that your assertion 'If you can apply all the constraints,
31	> > > > then implicitly REQUIRED_USE is satisfied.' is false: You're
32	> > > > changing the values of useflags, thus might violate some
33	> > > > previously checked constraint.
34	> > >
35	> > > So you reiterate. Either you reach a solution (preferably there's
36	> > > only a single valid solution you can reach) or you hit a previous
37	> > > state which indicates a loop and you fail.
38	> >
39	> >
40	> > So, PM, for every ebuild, will need to store the (at most) 2^n
41	> > states it has already seen while trying to solve REQUIRED_USE and
42	> > iterate until it cannot proceed anymore or falls into a previous
43	> > state (so that's 2^n time too). That way we go from a linear time &
44	> > linear space algorithm to one in exponential time & space. That's
45	> > probably not a good idea.
46	>
47	> I don't think that's actually going to happen. You'd have to try
48	> really hard to get over n-1 iterations. I mean, the only simple way I
49	> can think of is:
50	>
51	> b? ( a ) c? ( b ) d? ( c ) e? ( d ) ...
52	>
53	> and this only means n-1 iterations. Can you think of a better way to
54	> force multiple iterations that can be written without making
55	> REQUIRED_USE completely unreadable? How likely is that going to happen
56	> accidentally?
57
58	I don't see any reason why it should terminate after n iterations; I
59	don't see any example of how to do more either. I can try to think more
60	about it, but maybe it is not even needed, see below.
61
62	> > I think it's better to limit the number of iterations to some
63	> > constant. I'd say 1, then fail if REQUIRED_USE is still not
64	> > satisfied. Maybe 2 or 3 can be useful but I think it'd be much
65	> > harder to provide automated checks for that and much harder for
66	> > ebuild writers to understand what will happen with the REQUIRED_USE
67	> > they're about to write.
68	>
69	> Well, I don't mind setting that. All my tests (that weren't
70	> deliberately abusing the algorithm) were able to finish in a single
71	> iteration. 2 or 3 should probably be safe for all the reasonable
72	> uses. However, if we're not going to verify all possible values on
73	> repoman side, I think it would be better to have a larger limit for
74	> users, to not have things explode on them unnecessarily.
75
76
77	Look, if we assume left to right solving, only one iteration possible
78	and all the constraints to be of the form 'p_1? p_2? ... p_n? ( q_1 ...
79	q_m )' where p_i and q_i are either 'useflag' or '!useflag', then we
80	only have to check when such a clause can change a previously satisfied
81	clause to unsatisfied.
82
83	For two clauses:
84	"p_1? p_2? ... p_n? ( q_1 ... q_m )" and "p'_1? p'_2? ... p'_{n'}?
85	( q'_1 ... q'_{m'} )", assuming the first is satisfied, when can solving
86	the 2nd break the 1st?
87
88	It must be that:
89	1.The conditions are compatible: No p_i is the negation of a p'_j.
90	2.Solving the 1st does not make the 2nd trivially true: No q_i is
91	the negation of a p'_j.
92	3.Solving the 2nd does not make the 1st trivially true afterwards: No
93	p_i is the negation of a q'_j.
94	4.Solving the 2nd does break the 1st assumption: (A q_i is
95	the negation of a q_'j) or (a q'_j is some p_i and one of q_1 ... q_m
96	might be false).
97
98	The only a priori non polynomial part here is "one of q_1 ... q_m
99	might be false". If we do not check that (only check that some q'_j is
100	some p_i), we are still guaranteed that the 2nd clause cannot break the
101	1st but we might end up rejecting otherwise valid constructs.
102
103	Doing that, we have a check guaranteeing that the above algorithm will
104	always provide a valid answer for 'clause_1 clause_2' in
105	O(\|clause_1\|*\|clause_2\|) time.
106	For a complete REQUIRED_USE='clause_1 ... clause_k', it is sufficient
107	to check that, for any i>j, clause_i cannot break clause_j in the above
108	sense. This gives a polynomial algorithm for being certain that a
109	REQUIRED_USE constraint guarantees the existence of a valid solution
110	after one pass.
111
112
113	We can do even better: Consider a graph where the nodes are the clauses
114	and there is an edge 'a -> b' if 'a' can break 'b' in the above sense.
115	This means 'a' must come left of 'b' in REQUIRED_USE. Hence, repoman
116	can do a topological sort of that graph and suggest a proper reordering
117	of the clauses or, when not possible, exhibit a cycle causing a problem
118	asking the developer to fix it.
119
120
121
122	Let's run a few examples to see if dropping the condition in point 4.
123	is not too much to ask:
124	"
125	\|\| ( aqua wayland X )
126	xinerama? ( X )
127	"
128	Becomes:
129	"
130	!X? ( !wayland? ( aqua ) )
131	xinerama? ( X )
132	"
133	Points 1. and 2. above hold, point 3 fails with p_i = !X, q'_j = X. So
134	they're good and the constraint is valid in the above sense.
135
136
137	Another example, simplified form from php:
138	"
139	cli? ( ^^ ( readline libedit ) )
140	\|\| ( cli cgi )
141	"
142	Becomes:
143	"
144	(a) cli? ( !libedit? ( readline ) )
145	(b) cli? ( libedit? ( !readline ) )
146	(c) !cgi? ( cli )
147	"
148
149	Let's build the graph: No edge between (a) and (b) because they fail
150	point 1. No edge (a) -> (c) nor (b) -> (c): They fail point 4 since
151	"readline" does not appear in (c). There are edges (c) -> (a) and (c) ->
152	(b): Point 1 to 3 hold, point 4 holds because of 'a q'_j is some p_i'
153	with q'_j = p_i = "cli". The ordering is thus invalid because (a) and
154	(b) come before (c) and there are edges going from (c) to both of them.
155	A topological sort will give you something like "(c) (a) (b)" and
156	repoman can suggest you to rewrite that as:
157	"
158	(c) !cgi? ( cli )
159	(a) cli? ( !libedit? ( readline ) )
160	(b) cli? ( libedit? ( !readline ) )
161	"
162	In addition, if we can ensure the topological sort is somewhat stable
163	(*), then it is possible to infer back that (a) (b) come from "cli? ( ^^
164	( readline libedit ) )" and (c) comes from "\|\| ( cli cgi )" so the
165	repoman warning/error could look like:
166	Hey, your REQUIRED_USE does not allow easy solving in one pass, would
167	you mind using "\|\| ( cli cgi ) cli? ( ^^ ( readline libedit ) )"
168	instead ?
169
170	(*) Ensuring clauses expanded from the same construct in REQUIRED_USE
171	stay together and thus can be grouped back to their original form. This
172	might not be possible or easy though.
173
174
175	A last example: The mysql/mysqli from php you mentioned in an earlier
176	email here.
177	"
178	(a) recode? ( !imap !mysqli )
179	(b) mysql? !pdo? ( mysqli )
180	"
181	There are edges (a) -> (b) and (b) -> (a): Points 1 to 3 hold; Point 4
182	holds because 'A q_i is the negation of a q_'j' with mysqli/!mysqli.
183	The topological sort will fail since there is a cycle and will exhibit
184	this cycle. repoman warning/error could look like:
185	Hey, your constraint does not allow automatic solving, there is a cycle
186	'recode? ( !imap !mysqli )' -> 'mysql? !pdo? ( mysqli )' -> 'recode?
187	( !imap !mysqli )'.
188
189
190
191
192
193	This whole thing definitely needs more thought and feedback but for now
194	those extra restrictions seem quite natural to me, allow easy solving
195	on the PM side and allow to have useful feedback from repoman.
196
197	Alexis.