1 |
On Tue, 26 Jan 2016 20:52:09 +0100 |
2 |
Dirkjan Ochtman <djc@g.o> wrote: |
3 |
|
4 |
> All, |
5 |
> |
6 |
> TL;DR: I think we should switch from DTD to RELAX NG (compact syntax, |
7 |
> ideally) for our XML validation needs. It is more expressive and more |
8 |
> readable. |
9 |
> |
10 |
> Most people who know anything about XML stuff know that DTDs are not |
11 |
> that great a solution for validation. Their expression power is very |
12 |
> limited; there are a few examples of this is in our metadata.dtd [1]. |
13 |
> For a few years now, I've wanted to see if we could replace |
14 |
> metadata.dtd with something in RELAX NG, which is a more modern XML |
15 |
> schema language; it's an ISO standard with an emphasis on readability |
16 |
> both for humans and for tools (by using a rigorous formalism). Some |
17 |
> arguments in favor of RELAX NG (and some counter-arguments) are |
18 |
> enumerated on Tim Bray's weblog [2]. I've created a compact syntax |
19 |
> schema for metadata that can validate all metadata.xml files currently |
20 |
> in the tree, as an example [3]. |
21 |
> |
22 |
> Some arguments against: |
23 |
> |
24 |
> - Not enough tool support for RELAX NG: I'd be curious to hear what |
25 |
> tools you want to use. At least libxml2 supports RELAX NG natively. |
26 |
> The Python lxml library uses that support to provide pretty simple |
27 |
> RELAX NG validation. libxml2 does not have native compact syntax |
28 |
> support, but I maintain a simple library called rnc2rng [4] that is |
29 |
> used transparently by lxml if installed. rnc2rng also comes with a |
30 |
> rnc2rng command-line script to do the conversion. |
31 |
> |
32 |
> - Performance: in a quick test with lxml (backed by libxml2), RELAX NG |
33 |
> validation takes very similar time compared to DTD. Testing with |
34 |
> ~19000 metadata.xml files in the tree, with DTD (best of 3): |
35 |
> |
36 |
> real 0m2.861s |
37 |
> user 0m2.560s |
38 |
> sys 0m0.296s |
39 |
> |
40 |
> With RNC (best of 3): |
41 |
> |
42 |
> real 0m3.058s |
43 |
> user 0m2.688s |
44 |
> sys 0m0.364s |
45 |
> |
46 |
> We could probably easily maintain an XML Schema shadow schema if |
47 |
> that's really desired, but I would be in favor of making RELAX NG our |
48 |
> main schema language. I can easily do the work to update repoman for |
49 |
> this (I've already refactored the metadata code in repoman). What |
50 |
> other stuff would need to be updated? |
51 |
> |
52 |
> Comments? |
53 |
|
54 |
Could you post a generated .rng and XML Schema files for comparison? |
55 |
They don't have to be perfect conversions, just to see how different |
56 |
they are. |
57 |
|
58 |
-- |
59 |
Best regards, |
60 |
Michał Górny |
61 |
<http://dev.gentoo.org/~mgorny/> |