relax-ng-comment message

Subject: [relax-ng-comment] Limitations of W3C XML Schemas vs. RELAX NG?
From: mertz@gnosis.cx (David Mertz, Ph.D.)
To: relax-ng-comment@lists.oasis-open.org
Date: Mon, 20 Jan 2003 03:06:39 -0500
I am writing an article on RELAX NG, and particularly at its advantages
over W3C XML Schemas.  I have just begun to look at RELAX NG (but it
seems amazingly clear and well designed), and have not worked with W3C
Schemas that much more.  I hope to use an example like the below in my
article, and was hoping that readers can point out if I've made any
errors in my examples.

The following is something I actually use in a data serialization
library (gnosis.xml.pickle, for Python).  I have simplified the
example, but the principle is the same as described.  I have a class
of XML documents that can look like:

  ----- Mixed Model Document -----
  <?xml version="1.0"?>
  <Foo>
    <item>Value</item>
    <item value="Value" />
  </Foo>

That if, an <item> element may *either* contain PCDATA content, or it
may contain a "value" attribute, but not both.  So the following are
all invalid.  They are rejected by the application logic, but I want
to be able to do it at the document level:

  ----- Invalid Document 1 -----
  <?xml version="1.0"?>
  <Foo><item invalid="Bar"/></Foo>

  ----- Invalid Document 2 -----
  <?xml version="1.0"?>
  <Foo><item><invalid/></item></Foo>

  ----- Invalid Document 3 -----
  <?xml version="1.0"?>
  <Foo><item value="Bar">Bar</item></Foo>

  ----- Invalid Document 4 -----
  <?xml version="1.0"?>
  <Foo><item /></Foo>

I can use a DTD to weed out some invalid documents, e.g.:

  ----- Approximate DTD -----
  <!ELEMENT Foo (item+)>
  <!ELEMENT item (#PCDATA)>
  <!ATTLIST item value CDATA #IMPLIED>

This would reject #1 and #2, since they have either invalid element
content or invalid attributes.  But #3 and #4 both pass through, since
the "value" attribute and the PCDATA content are both optional, and
independent.

Writing a W3C XML Schema to describe the same constraints as the DTD
seems to be... well, more complicated than I hoped, but doable once I
looked through the schema specs (please let me know if this is wrong):

  ----- W3C XML Schma -----
  <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema";>
  <xsd:element name="Foo">
    <xsd:element name="item" minOccurs="1" maxOccurs="unbounded">
      <xsd:complexType>
        <xsd:simpleContent>
          <xsd:extension base="xsd:string">
            <xsd:attribute name="value" type="xsd:string" use="optional"/>
          </xsd:extension>
        </xsd:simpleContent>
      </xsd:complexType>
    </xsd:element>
  </xsd:element>

But it looks like RELAX NG is strictly more powerful here.  I can
write a (readable) description as:

  ----- RELAX NG -----
  <element name="Foo" xmlns="http://relaxng.org/ns/structure/1.0";>
    <element name="item">
      <choice>
        <text/>
        <attribute name="value"/>
      </choice>
    </element>
  </element>

I could, of course, also use the datatyping libraries for control of
the content types, but this shows the use of <choice> between the body
and attribute.  Moreover, I think the following is correct compact
syntax (but I might have made an error, let me know):

  ----- RELAX NG Compact Syntax -----
  element Foo { element item { attribute value { text } | { text }? } }

Yours, David...