1 |
On Fri, 02 Jun 2017 15:55:17 +0200 |
2 |
Michał Górny <mgorny@g.o> wrote: |
3 |
|
4 |
> On pią, 2017-06-02 at 13:27 +0200, Alexis Ballier wrote: |
5 |
> > On Thu, 01 Jun 2017 23:31:25 +0200 |
6 |
> > Michał Górny <mgorny@g.o> wrote: |
7 |
> > [...] |
8 |
> > > > There are probably dozens of ways to make that non |
9 |
> > > > deterministic. Here's one: USE='-*'. Apply '|| ( cli cgi fpm |
10 |
> > > > apache2 embed phpdbg )' last; this enables 'cli'. Since it's |
11 |
> > > > the last one, REQUIRED_USE is not satisfied because of 'cli? |
12 |
> > > > ( ^^ ( readline libedit ) )'. If you process it first, then you |
13 |
> > > > enable cli and readline and are good. |
14 |
> > > |
15 |
> > > You just take a second iteration, and things fall into place. |
16 |
> > > That's not a problem. |
17 |
> > > |
18 |
> > > > > > Also, what happens if we applied all the constraints and |
19 |
> > > > > > obtained some useflags setting that still fails REQUIRED_USE |
20 |
> > > > > > check ? |
21 |
> > > > > |
22 |
> > > > > It can't happen. If you can apply all the constraints, then |
23 |
> > > > > implicitly REQUIRED_USE is satisfied. If you can't apply all |
24 |
> > > > > the constraints, then it just fails. Of course, we want to |
25 |
> > > > > ultimately avoid that case. |
26 |
> > > > |
27 |
> > > > See the php example above. How do you ensure this does not |
28 |
> > > > happen? |
29 |
> > > > |
30 |
> > > > Note that your assertion 'If you can apply all the constraints, |
31 |
> > > > then implicitly REQUIRED_USE is satisfied.' is false: You're |
32 |
> > > > changing the values of useflags, thus might violate some |
33 |
> > > > previously checked constraint. |
34 |
> > > |
35 |
> > > So you reiterate. Either you reach a solution (preferably there's |
36 |
> > > only a single valid solution you can reach) or you hit a previous |
37 |
> > > state which indicates a loop and you fail. |
38 |
> > |
39 |
> > |
40 |
> > So, PM, for every ebuild, will need to store the (at most) 2^n |
41 |
> > states it has already seen while trying to solve REQUIRED_USE and |
42 |
> > iterate until it cannot proceed anymore or falls into a previous |
43 |
> > state (so that's 2^n time too). That way we go from a linear time & |
44 |
> > linear space algorithm to one in exponential time & space. That's |
45 |
> > probably not a good idea. |
46 |
> |
47 |
> I don't think that's actually going to happen. You'd have to try |
48 |
> really hard to get over n-1 iterations. I mean, the only simple way I |
49 |
> can think of is: |
50 |
> |
51 |
> b? ( a ) c? ( b ) d? ( c ) e? ( d ) ... |
52 |
> |
53 |
> and this only means n-1 iterations. Can you think of a better way to |
54 |
> force multiple iterations that can be written without making |
55 |
> REQUIRED_USE completely unreadable? How likely is that going to happen |
56 |
> accidentally? |
57 |
|
58 |
I don't see any reason why it should terminate after n iterations; I |
59 |
don't see any example of how to do more either. I can try to think more |
60 |
about it, but maybe it is not even needed, see below. |
61 |
|
62 |
> > I think it's better to limit the number of iterations to some |
63 |
> > constant. I'd say 1, then fail if REQUIRED_USE is still not |
64 |
> > satisfied. Maybe 2 or 3 can be useful but I think it'd be much |
65 |
> > harder to provide automated checks for that and much harder for |
66 |
> > ebuild writers to understand what will happen with the REQUIRED_USE |
67 |
> > they're about to write. |
68 |
> |
69 |
> Well, I don't mind setting that. All my tests (that weren't |
70 |
> deliberately abusing the algorithm) were able to finish in a single |
71 |
> iteration. 2 or 3 should probably be safe for all the reasonable |
72 |
> uses. However, if we're not going to verify all possible values on |
73 |
> repoman side, I think it would be better to have a larger limit for |
74 |
> users, to not have things explode on them unnecessarily. |
75 |
|
76 |
|
77 |
Look, if we assume left to right solving, only one iteration possible |
78 |
and all the constraints to be of the form 'p_1? p_2? ... p_n? ( q_1 ... |
79 |
q_m )' where p_i and q_i are either 'useflag' or '!useflag', then we |
80 |
only have to check when such a clause can change a previously satisfied |
81 |
clause to unsatisfied. |
82 |
|
83 |
For two clauses: |
84 |
"p_1? p_2? ... p_n? ( q_1 ... q_m )" and "p'_1? p'_2? ... p'_{n'}? |
85 |
( q'_1 ... q'_{m'} )", assuming the first is satisfied, when can solving |
86 |
the 2nd break the 1st? |
87 |
|
88 |
It must be that: |
89 |
1.The conditions are compatible: No p_i is the negation of a p'_j. |
90 |
2.Solving the 1st does not make the 2nd trivially true: No q_i is |
91 |
the negation of a p'_j. |
92 |
3.Solving the 2nd does not make the 1st trivially true afterwards: No |
93 |
p_i is the negation of a q'_j. |
94 |
4.Solving the 2nd does break the 1st assumption: (A q_i is |
95 |
the negation of a q_'j) or (a q'_j is some p_i and one of q_1 ... q_m |
96 |
might be false). |
97 |
|
98 |
The only a priori non polynomial part here is "one of q_1 ... q_m |
99 |
might be false". If we do not check that (only check that some q'_j is |
100 |
some p_i), we are still guaranteed that the 2nd clause cannot break the |
101 |
1st but we might end up rejecting otherwise valid constructs. |
102 |
|
103 |
Doing that, we have a check guaranteeing that the above algorithm will |
104 |
always provide a valid answer for 'clause_1 clause_2' in |
105 |
O(|clause_1|*|clause_2|) time. |
106 |
For a complete REQUIRED_USE='clause_1 ... clause_k', it is sufficient |
107 |
to check that, for any i>j, clause_i cannot break clause_j in the above |
108 |
sense. This gives a polynomial algorithm for being certain that a |
109 |
REQUIRED_USE constraint guarantees the existence of a valid solution |
110 |
after one pass. |
111 |
|
112 |
|
113 |
We can do even better: Consider a graph where the nodes are the clauses |
114 |
and there is an edge 'a -> b' if 'a' can break 'b' in the above sense. |
115 |
This means 'a' must come left of 'b' in REQUIRED_USE. Hence, repoman |
116 |
can do a topological sort of that graph and suggest a proper reordering |
117 |
of the clauses or, when not possible, exhibit a cycle causing a problem |
118 |
asking the developer to fix it. |
119 |
|
120 |
|
121 |
|
122 |
Let's run a few examples to see if dropping the condition in point 4. |
123 |
is not too much to ask: |
124 |
" |
125 |
|| ( aqua wayland X ) |
126 |
xinerama? ( X ) |
127 |
" |
128 |
Becomes: |
129 |
" |
130 |
!X? ( !wayland? ( aqua ) ) |
131 |
xinerama? ( X ) |
132 |
" |
133 |
Points 1. and 2. above hold, point 3 fails with p_i = !X, q'_j = X. So |
134 |
they're good and the constraint is valid in the above sense. |
135 |
|
136 |
|
137 |
Another example, simplified form from php: |
138 |
" |
139 |
cli? ( ^^ ( readline libedit ) ) |
140 |
|| ( cli cgi ) |
141 |
" |
142 |
Becomes: |
143 |
" |
144 |
(a) cli? ( !libedit? ( readline ) ) |
145 |
(b) cli? ( libedit? ( !readline ) ) |
146 |
(c) !cgi? ( cli ) |
147 |
" |
148 |
|
149 |
Let's build the graph: No edge between (a) and (b) because they fail |
150 |
point 1. No edge (a) -> (c) nor (b) -> (c): They fail point 4 since |
151 |
"readline" does not appear in (c). There are edges (c) -> (a) and (c) -> |
152 |
(b): Point 1 to 3 hold, point 4 holds because of 'a q'_j is some p_i' |
153 |
with q'_j = p_i = "cli". The ordering is thus invalid because (a) and |
154 |
(b) come before (c) and there are edges going from (c) to both of them. |
155 |
A topological sort will give you something like "(c) (a) (b)" and |
156 |
repoman can suggest you to rewrite that as: |
157 |
" |
158 |
(c) !cgi? ( cli ) |
159 |
(a) cli? ( !libedit? ( readline ) ) |
160 |
(b) cli? ( libedit? ( !readline ) ) |
161 |
" |
162 |
In addition, if we can ensure the topological sort is somewhat stable |
163 |
(*), then it is possible to infer back that (a) (b) come from "cli? ( ^^ |
164 |
( readline libedit ) )" and (c) comes from "|| ( cli cgi )" so the |
165 |
repoman warning/error could look like: |
166 |
Hey, your REQUIRED_USE does not allow easy solving in one pass, would |
167 |
you mind using "|| ( cli cgi ) cli? ( ^^ ( readline libedit ) )" |
168 |
instead ? |
169 |
|
170 |
(*) Ensuring clauses expanded from the same construct in REQUIRED_USE |
171 |
stay together and thus can be grouped back to their original form. This |
172 |
might not be possible or easy though. |
173 |
|
174 |
|
175 |
A last example: The mysql/mysqli from php you mentioned in an earlier |
176 |
email here. |
177 |
" |
178 |
(a) recode? ( !imap !mysqli ) |
179 |
(b) mysql? !pdo? ( mysqli ) |
180 |
" |
181 |
There are edges (a) -> (b) and (b) -> (a): Points 1 to 3 hold; Point 4 |
182 |
holds because 'A q_i is the negation of a q_'j' with mysqli/!mysqli. |
183 |
The topological sort will fail since there is a cycle and will exhibit |
184 |
this cycle. repoman warning/error could look like: |
185 |
Hey, your constraint does not allow automatic solving, there is a cycle |
186 |
'recode? ( !imap !mysqli )' -> 'mysql? !pdo? ( mysqli )' -> 'recode? |
187 |
( !imap !mysqli )'. |
188 |
|
189 |
|
190 |
|
191 |
|
192 |
|
193 |
This whole thing definitely needs more thought and feedback but for now |
194 |
those extra restrictions seem quite natural to me, allow easy solving |
195 |
on the PM side and allow to have useful feedback from repoman. |
196 |
|
197 |
Alexis. |