Gentoo Archives: gentoo-dev

From: "Michał Górny" <mgorny@g.o>
To: Dirkjan Ochtman <djc@g.o>
Cc: Gentoo Development <gentoo-dev@l.g.o>
Subject: Re: [gentoo-dev] New schema language for metadata validation?
Date: Wed, 27 Jan 2016 12:10:00
Message-Id: 20160127130940.7b5a3a79.mgorny@gentoo.org
In Reply to: [gentoo-dev] New schema language for metadata validation? by Dirkjan Ochtman
1 On Tue, 26 Jan 2016 20:52:09 +0100
2 Dirkjan Ochtman <djc@g.o> wrote:
3
4 > All,
5 >
6 > TL;DR: I think we should switch from DTD to RELAX NG (compact syntax,
7 > ideally) for our XML validation needs. It is more expressive and more
8 > readable.
9 >
10 > Most people who know anything about XML stuff know that DTDs are not
11 > that great a solution for validation. Their expression power is very
12 > limited; there are a few examples of this is in our metadata.dtd [1].
13 > For a few years now, I've wanted to see if we could replace
14 > metadata.dtd with something in RELAX NG, which is a more modern XML
15 > schema language; it's an ISO standard with an emphasis on readability
16 > both for humans and for tools (by using a rigorous formalism). Some
17 > arguments in favor of RELAX NG (and some counter-arguments) are
18 > enumerated on Tim Bray's weblog [2]. I've created a compact syntax
19 > schema for metadata that can validate all metadata.xml files currently
20 > in the tree, as an example [3].
21 >
22 > Some arguments against:
23 >
24 > - Not enough tool support for RELAX NG: I'd be curious to hear what
25 > tools you want to use. At least libxml2 supports RELAX NG natively.
26 > The Python lxml library uses that support to provide pretty simple
27 > RELAX NG validation. libxml2 does not have native compact syntax
28 > support, but I maintain a simple library called rnc2rng [4] that is
29 > used transparently by lxml if installed. rnc2rng also comes with a
30 > rnc2rng command-line script to do the conversion.
31 >
32 > - Performance: in a quick test with lxml (backed by libxml2), RELAX NG
33 > validation takes very similar time compared to DTD. Testing with
34 > ~19000 metadata.xml files in the tree, with DTD (best of 3):
35 >
36 > real 0m2.861s
37 > user 0m2.560s
38 > sys 0m0.296s
39 >
40 > With RNC (best of 3):
41 >
42 > real 0m3.058s
43 > user 0m2.688s
44 > sys 0m0.364s
45 >
46 > We could probably easily maintain an XML Schema shadow schema if
47 > that's really desired, but I would be in favor of making RELAX NG our
48 > main schema language. I can easily do the work to update repoman for
49 > this (I've already refactored the metadata code in repoman). What
50 > other stuff would need to be updated?
51 >
52 > Comments?
53
54 Could you post a generated .rng and XML Schema files for comparison?
55 They don't have to be perfect conversions, just to see how different
56 they are.
57
58 --
59 Best regards,
60 Michał Górny
61 <http://dev.gentoo.org/~mgorny/>

Replies

Subject Author
Re: [gentoo-dev] New schema language for metadata validation? Dirkjan Ochtman <djc@g.o>