relax-ng message

Subject: Issue 57
From: James Clark <jjc@jclark.com>
To: relax-ng@lists.oasis-open.org
Date: Thu, 13 Sep 2001 13:21:34 +0700
I have been trying to solve issue 57.

Problem
-------

The spec says whitespace-only strings in an element are stripped
before it is validated against the content pattern.

This makes the following document

<foo>   </foo>

invalid with respect to this schema

<element name="foo">
 <data type="string"
       datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes">
  <param name="minLength">2</param>
 </data>
</element>


Here is another problem. According to the inference rules in the spec,
the pattern

<element name="foo" xmlns="http://relaxng.org/ns/structure/0.9">
  <list>
    <data type="token"/>
    <data type="token"/>
  </list>
</element>

matches

<foo>x</foo>

which I think users will find surprising. This is because of the
(empty string) rule which says that if a pattern matches the empty
string, then it matches the empty sequence.

Spec changes
------------

6.2 (or maybe new 6.2.1)

Introduce variable range ws, which is either an empty sequence or a
string consisting entirely of whitespace.

Introduce variant of =~, call it =~c (for complete match), with the
following inference rules:

    cx |- a, m =~ p
  ------------------ (complete match 1)
    cx |- a, m =~c p

    cx |- a, () =~ p
  --------------------- (complete match 2)
    cx |- a, ws =~c p

    cx |- a, "" =~ p
   ------------------  (complete match 3)
    cx |- a, () =~c p

So, for example, we have:

" " =~c <empty/>
() =~c <data type="string"/>
not(" " =~c <value type="string"/>)

Note that (complete match 3) replaces the old (empty string) rule.

(complete match 2) says that the content of an element or attribute
that consists of only whitespace matches <empty/>.

6.2.7

Change the (attribute) rule to:

                 cx |- {}; s =~c p     n in nc
  -----------------------------------------------------
    cx |- attribute(n, s) =~ <attribute>nc p</attribute>


No need for v variable range or toString function.

6.2.8

Change rule (element) to the following:

cx1 |- a; m =~c p
n in nc
okAsChildren(m)
deref(ln) = <element> nc p </element>
----------------------------------------------------------------
cx2 |- {}; ws1, element( n, cx1, a, m ), ws2 =~ <ref name="ln"/>

No need for stripSpace function.

6.2.9

Get rid of (empty string) rule.

Implementation
--------------

From an implementation perspective, this means that

- <value> and <data> are not "nullable", ie they don't match the empty
  sequence

- if an element or attribute has content that is nothing but
  whitespace (including no whitespace at all), then it matches an
  element or attribute pattern iff the name matches and the pattern
  for the content matches either the empty sequence or a string
  consisting of that whitespace

- a whitespace text node that has a sibling element is ignored

Advantages
----------

I think this makes things work in an unsurprising way. It's more
expressive than the current spec. For example, if you want something
that matches <foo></foo> or <foo/> but not <foo> </foo> (ie more
similar to EMPTY in XML 1.0), then you can use

  <element name="foo"><value type="string"/></element>

James
Follow-Ups:
- Re: Issue 57
  - From: Kohsuke KAWAGUCHI <kohsuke.kawaguchi@sun.com>