Gentoo Archives: gentoo-dev

From: Alexis Ballier <aballier@g.o>
To: gentoo-dev@l.g.o
Subject: Re: [gentoo-dev] [RFC] Forced/automatic USE flag constraints (codename: ENFORCED_USE)
Date: Sat, 03 Jun 2017 11:00:24
Message-Id: 20170603130000.4f88fb14@gentoo.org
In Reply to: Re: [gentoo-dev] [RFC] Forced/automatic USE flag constraints (codename: ENFORCED_USE) by "Michał Górny"
1 On Fri, 02 Jun 2017 15:55:17 +0200
2 Michał Górny <mgorny@g.o> wrote:
3
4 > On pią, 2017-06-02 at 13:27 +0200, Alexis Ballier wrote:
5 > > On Thu, 01 Jun 2017 23:31:25 +0200
6 > > Michał Górny <mgorny@g.o> wrote:
7 > > [...]
8 > > > > There are probably dozens of ways to make that non
9 > > > > deterministic. Here's one: USE='-*'. Apply '|| ( cli cgi fpm
10 > > > > apache2 embed phpdbg )' last; this enables 'cli'. Since it's
11 > > > > the last one, REQUIRED_USE is not satisfied because of 'cli?
12 > > > > ( ^^ ( readline libedit ) )'. If you process it first, then you
13 > > > > enable cli and readline and are good.
14 > > >
15 > > > You just take a second iteration, and things fall into place.
16 > > > That's not a problem.
17 > > >
18 > > > > > > Also, what happens if we applied all the constraints and
19 > > > > > > obtained some useflags setting that still fails REQUIRED_USE
20 > > > > > > check ?
21 > > > > >
22 > > > > > It can't happen. If you can apply all the constraints, then
23 > > > > > implicitly REQUIRED_USE is satisfied. If you can't apply all
24 > > > > > the constraints, then it just fails. Of course, we want to
25 > > > > > ultimately avoid that case.
26 > > > >
27 > > > > See the php example above. How do you ensure this does not
28 > > > > happen?
29 > > > >
30 > > > > Note that your assertion 'If you can apply all the constraints,
31 > > > > then implicitly REQUIRED_USE is satisfied.' is false: You're
32 > > > > changing the values of useflags, thus might violate some
33 > > > > previously checked constraint.
34 > > >
35 > > > So you reiterate. Either you reach a solution (preferably there's
36 > > > only a single valid solution you can reach) or you hit a previous
37 > > > state which indicates a loop and you fail.
38 > >
39 > >
40 > > So, PM, for every ebuild, will need to store the (at most) 2^n
41 > > states it has already seen while trying to solve REQUIRED_USE and
42 > > iterate until it cannot proceed anymore or falls into a previous
43 > > state (so that's 2^n time too). That way we go from a linear time &
44 > > linear space algorithm to one in exponential time & space. That's
45 > > probably not a good idea.
46 >
47 > I don't think that's actually going to happen. You'd have to try
48 > really hard to get over n-1 iterations. I mean, the only simple way I
49 > can think of is:
50 >
51 > b? ( a ) c? ( b ) d? ( c ) e? ( d ) ...
52 >
53 > and this only means n-1 iterations. Can you think of a better way to
54 > force multiple iterations that can be written without making
55 > REQUIRED_USE completely unreadable? How likely is that going to happen
56 > accidentally?
57
58 I don't see any reason why it should terminate after n iterations; I
59 don't see any example of how to do more either. I can try to think more
60 about it, but maybe it is not even needed, see below.
61
62 > > I think it's better to limit the number of iterations to some
63 > > constant. I'd say 1, then fail if REQUIRED_USE is still not
64 > > satisfied. Maybe 2 or 3 can be useful but I think it'd be much
65 > > harder to provide automated checks for that and much harder for
66 > > ebuild writers to understand what will happen with the REQUIRED_USE
67 > > they're about to write.
68 >
69 > Well, I don't mind setting that. All my tests (that weren't
70 > deliberately abusing the algorithm) were able to finish in a single
71 > iteration. 2 or 3 should probably be safe for all the reasonable
72 > uses. However, if we're not going to verify all possible values on
73 > repoman side, I think it would be better to have a larger limit for
74 > users, to not have things explode on them unnecessarily.
75
76
77 Look, if we assume left to right solving, only one iteration possible
78 and all the constraints to be of the form 'p_1? p_2? ... p_n? ( q_1 ...
79 q_m )' where p_i and q_i are either 'useflag' or '!useflag', then we
80 only have to check when such a clause can change a previously satisfied
81 clause to unsatisfied.
82
83 For two clauses:
84 "p_1? p_2? ... p_n? ( q_1 ... q_m )" and "p'_1? p'_2? ... p'_{n'}?
85 ( q'_1 ... q'_{m'} )", assuming the first is satisfied, when can solving
86 the 2nd break the 1st?
87
88 It must be that:
89 1.The conditions are compatible: No p_i is the negation of a p'_j.
90 2.Solving the 1st does not make the 2nd trivially true: No q_i is
91 the negation of a p'_j.
92 3.Solving the 2nd does not make the 1st trivially true afterwards: No
93 p_i is the negation of a q'_j.
94 4.Solving the 2nd does break the 1st assumption: (A q_i is
95 the negation of a q_'j) or (a q'_j is some p_i and one of q_1 ... q_m
96 might be false).
97
98 The only a priori non polynomial part here is "one of q_1 ... q_m
99 might be false". If we do not check that (only check that some q'_j is
100 some p_i), we are still guaranteed that the 2nd clause cannot break the
101 1st but we might end up rejecting otherwise valid constructs.
102
103 Doing that, we have a check guaranteeing that the above algorithm will
104 always provide a valid answer for 'clause_1 clause_2' in
105 O(|clause_1|*|clause_2|) time.
106 For a complete REQUIRED_USE='clause_1 ... clause_k', it is sufficient
107 to check that, for any i>j, clause_i cannot break clause_j in the above
108 sense. This gives a polynomial algorithm for being certain that a
109 REQUIRED_USE constraint guarantees the existence of a valid solution
110 after one pass.
111
112
113 We can do even better: Consider a graph where the nodes are the clauses
114 and there is an edge 'a -> b' if 'a' can break 'b' in the above sense.
115 This means 'a' must come left of 'b' in REQUIRED_USE. Hence, repoman
116 can do a topological sort of that graph and suggest a proper reordering
117 of the clauses or, when not possible, exhibit a cycle causing a problem
118 asking the developer to fix it.
119
120
121
122 Let's run a few examples to see if dropping the condition in point 4.
123 is not too much to ask:
124 "
125 || ( aqua wayland X )
126 xinerama? ( X )
127 "
128 Becomes:
129 "
130 !X? ( !wayland? ( aqua ) )
131 xinerama? ( X )
132 "
133 Points 1. and 2. above hold, point 3 fails with p_i = !X, q'_j = X. So
134 they're good and the constraint is valid in the above sense.
135
136
137 Another example, simplified form from php:
138 "
139 cli? ( ^^ ( readline libedit ) )
140 || ( cli cgi )
141 "
142 Becomes:
143 "
144 (a) cli? ( !libedit? ( readline ) )
145 (b) cli? ( libedit? ( !readline ) )
146 (c) !cgi? ( cli )
147 "
148
149 Let's build the graph: No edge between (a) and (b) because they fail
150 point 1. No edge (a) -> (c) nor (b) -> (c): They fail point 4 since
151 "readline" does not appear in (c). There are edges (c) -> (a) and (c) ->
152 (b): Point 1 to 3 hold, point 4 holds because of 'a q'_j is some p_i'
153 with q'_j = p_i = "cli". The ordering is thus invalid because (a) and
154 (b) come before (c) and there are edges going from (c) to both of them.
155 A topological sort will give you something like "(c) (a) (b)" and
156 repoman can suggest you to rewrite that as:
157 "
158 (c) !cgi? ( cli )
159 (a) cli? ( !libedit? ( readline ) )
160 (b) cli? ( libedit? ( !readline ) )
161 "
162 In addition, if we can ensure the topological sort is somewhat stable
163 (*), then it is possible to infer back that (a) (b) come from "cli? ( ^^
164 ( readline libedit ) )" and (c) comes from "|| ( cli cgi )" so the
165 repoman warning/error could look like:
166 Hey, your REQUIRED_USE does not allow easy solving in one pass, would
167 you mind using "|| ( cli cgi ) cli? ( ^^ ( readline libedit ) )"
168 instead ?
169
170 (*) Ensuring clauses expanded from the same construct in REQUIRED_USE
171 stay together and thus can be grouped back to their original form. This
172 might not be possible or easy though.
173
174
175 A last example: The mysql/mysqli from php you mentioned in an earlier
176 email here.
177 "
178 (a) recode? ( !imap !mysqli )
179 (b) mysql? !pdo? ( mysqli )
180 "
181 There are edges (a) -> (b) and (b) -> (a): Points 1 to 3 hold; Point 4
182 holds because 'A q_i is the negation of a q_'j' with mysqli/!mysqli.
183 The topological sort will fail since there is a cycle and will exhibit
184 this cycle. repoman warning/error could look like:
185 Hey, your constraint does not allow automatic solving, there is a cycle
186 'recode? ( !imap !mysqli )' -> 'mysql? !pdo? ( mysqli )' -> 'recode?
187 ( !imap !mysqli )'.
188
189
190
191
192
193 This whole thing definitely needs more thought and feedback but for now
194 those extra restrictions seem quite natural to me, allow easy solving
195 on the PM side and allow to have useful feedback from repoman.
196
197 Alexis.

Replies