1 |
On Tue, 1 May 2007 15:08:56 +0200 |
2 |
Piotr Jaroszyński <peper@g.o> wrote: |
3 |
|
4 |
> Hello, |
5 |
> |
6 |
> There was some discussion about forcing/not forcing tests in EAPI-1, |
7 |
> but there was clearly no compromise. Imho, tests are very important |
8 |
> and thus I want to discuss them a little more, but in more sensible |
9 |
> fashion. |
10 |
> |
11 |
> Firstly each test can be(not all categories are mutually exclusive): |
12 |
> - not existant |
13 |
> - non-functional |
14 |
> - not runnable from ebuild |
15 |
> - useful but unreasonable resource-wise |
16 |
> - useful and reasonable resource-wise |
17 |
> - necessary |
18 |
> - known to partially fail but with a way of skipping failing tests |
19 |
> - known to partially fail but with no easy way of skipping failing |
20 |
> tests Is that list comprehensive? |
21 |
|
22 |
I'd approach it a bit different: Before creating fixed classification |
23 |
groups I'd first identify the attributes of tests that should be used |
24 |
for those classifications. |
25 |
a) cost (in terms of runtime, resource usage, additional deps) |
26 |
b) effectiveness (does a failing/working test mean the package is |
27 |
broken/working?) |
28 |
c) importance (is there a realistic chance for the test to be useful?) |
29 |
d) correctness (does the test match the implementation? overlaps a bit |
30 |
with effectiveness) |
31 |
e) others? |
32 |
|
33 |
Each of these needs to be considered if we want to find a good |
34 |
compromise of which tests to run and which not. A test with high cost |
35 |
can still be worth running if effectiveness, correctness and importance |
36 |
are also high, on the other hand a test with little effectiveness, |
37 |
correctness and/or importance probably isn't worth running even with |
38 |
zero cost. |
39 |
Now the tricky question is how to actually measure those attributes. |
40 |
|
41 |
> Secondly we must answer the question how precisely we want to |
42 |
> distinguish them, so users/dev can choose which categories of tests |
43 |
> they want to run. What comes to mind is: |
44 |
> - run all tests |
45 |
> - run only necessary tests |
46 |
> - run only reasonable tests |
47 |
> - don't run tests at all |
48 |
> Again, is that list comprehensive? |
49 |
|
50 |
Problem is that terms like "reasonable" or "necessary" are quite |
51 |
subjective (regarding both humans and machines), and in this special |
52 |
context even "all" could be interpreted in different ways (btw, could |
53 |
someone give some real examples for packages with "necessary" tests?). |
54 |
|
55 |
So I think a more fine grained classification is needed that can be |
56 |
adopted for specific use cases (e.g. the mips+embedded profiles might |
57 |
want different defaults than the amd64+desktop profiles). |
58 |
|
59 |
Marius |
60 |
|
61 |
-- |
62 |
Public Key at http://www.genone.de/info/gpg-key.pub |
63 |
|
64 |
In the beginning, there was nothing. And God said, 'Let there be |
65 |
Light.' And there was still nothing, but you could see a bit better. |