Should empty elements allow whitespace?

RELAX NG issue list

Should empty elements allow whitespace? original post

proposals are also made in the original post.

clarification made by John Cowan.

Voted unanimously to resolve this issue by allowing elements with no declared children tp have whitespace.

Name of "difference" element The original post

"except" and "butNot" by jjc. TC is open to other suggestions.

TC's general feeling was, there is no good reason to change it. On 5/31/2001, TC has decided to close this issue without any action until someone bring up a new material.

xml:base support The original post any decision was postponed until xml:base gets REC status. add wildcard pattern The original post

the original post suggests to introduce syntax sugars to match frequently used wildcard patterns. Namely,

Add a pattern that matches a single element, regardless of its name, attributes and children. Add a pattern that matches anything: any attributes and any content.

Some concerns that whether this was important enough to be worth a special syntactic abbreviation. No conclusion was reached.

On 5/31/2001, TC decided to drop this feature for ver.1. Reasons that mentioned are (1) it's not a good practice so it's good to keep it slightly cumbersome, (2) we don't lose the expressiveness of RELAX NG, (3) it's less frequently used, (4) and adding it at the later moment is easier than dropping it at the later moment.

syntax of "global" attribute on "attribute" element The original post

jjc suggests form="prefixed|unprefixed" or form="qualified|unqualified" (the same as XML Schema).

Just as a possibility, jjc also mentioned to split <attribute> to two different elements, just like we did for the parent attribute of the <ref> element.

On 5/31/2001, TC decided to close this issue with no-action-required.

Split "include" element to two different elements The original post On April,4,2001 telcon, most people felt it was desirable to split, if a good pair of replacement names could be found. On 5/31/2001, pattern-level inclusion is renamed to <externalRef>. Name of "grammar" element The original post.

Murata-san reported that RELAX Namespace will use <framework> for its root element. Therefore, RELAX NG will keep using the name <grammar>.

Parameterized patterns The original post. From the beginning, jjc alludes to taking no action with this issue. Voted unanimously not to include them in the first version (Apr,4,2001 telcon). The feeling was that this was on the wrong side of the 80/20 divide for the first version. List of simple datatypes The original post.

This issue is merged into the "datatype and identity constraint" issue.

Decided to introduce <oneOrMoreToken> and <zeroOrMoreToken> patterns to produce list.

Versioning The original post.

(a) Put a version in the RELAX NG namespace URI (by jjc)

(b) Use a version attribute on the root element (by jjc)

He suggests to have both (a) and (b)

He suggests to use FPI (kk: kind of URN?)

In the 5/3 telecon, we've decided to use a proposal (a). See the detail of this proposal

Repeat M-N times The original post.

He wants to have this because of his own experiences

He suggests that this feature can be provided outside of the core spec (as a pre-processor like tool).

TC is still not convinved whether this feature is imporant enough to be added.

The decision is made (but tentatively) to drop this feature from version 1.

Syntax of parent attribute on ref element The original post.

James Clark proposed to change attribute name to more sutaible one, but none is suggested by anyone.

John Cowan suggests adding optional "grammar" attribute to "ref" element and thereby introducing the ability to refer to any ancestor grammar.

instead of . So now we have ,, and . ]]>

Allow difference of patterns The original post.

A comment from Jeni Tennison has re-opened this issue. Here is a quote from his post to relax-ng-comments:

bar bar This use case would imply that the difference element would be useful outside name classes as well. ]]>

It is easy to implement validators if the use of p1 p2 ]]> is limited in such a way that both p1 and p2 matches one token and one token only (e.g., <data>, <value>, choices of those things.)

So it looks like a good advancement in the functionality with a very small cost. Do we want to have <difference> in this restricted fashion?

TC has voted not to adopt the functionality (the original proposal of allowing <difference> to any patterns) for ver.1.0. But it is re-opened now.

ID/IDREF problem The original post. Summarization/generalization by jjc.

This issue is merged into datatype issue.

Drop concur The original post. jjc suggests to remove concur pattern from the spec by various reasons. Voted unanimously to drop concur pattern (Apr,5,2001 telcon). Add "exclusion" pattern The original post. jjc suggests to add a pattern to model a tag-name based exclusion. TC has voted and decided not to include this feature in the first version. TC will revist this issue in the future. (author of this issue list is unaware of when this vote took place.) name of the new language What will be the name of the new language? (The original post.)

Various people sugges various names (including, but not limited to, TRELAX, TryRELAX, RELAXED, RELEX, REFLEX, RELAX XML Schema, TREELAX, RELAX 2, EXLAX, etc, etc.

One of the concern is whether we should include "XML Schema" in the name.

Update(May,3rd): jjc suggests "RELAX something" for various reasons (see minutes of May,3rd telecon). In response, "RELAX NG" (next generation, I guess) and RELAX++ are proposed. Other post-fixes are welcome.

Names suggested after the telecon includes URELAX, iRELAX, and TRELAX. The editor feels that RELAX NG establishes some degree of popularity.

We will use "RELAX NG" and its pronunciation will be "relaxing."

restriction on use of string/data in pattern Some form of constraints on use of <string> or <data> pattern is necessary for validating processor to work correctly. (related posts [1] [2] ).

The current spec already has several restrictions that prevents problematic situations.

Some argues that the current restrictions still have something to be desired.

prohibits attribute/element pattern in attribute pattern

Currently, RELAX NG allows patterns like

]]>

Should we explicitly prohibits them?

(original posts [1] [2] ).

Those malformed patterns cannot accept anything: any RELAX NG processors can safely replace those malformed patterns by <notAllowed /> without changing semantics.

So at least it doesn't confuse processors.

Murata-san suggests to "implementations SHOULD issue a warning" for a pattern that matches the following condition: that is, pattern that "directly or indirectly contain other or patterns"]]> after the normalization.

In the conference call of 2001/06/14, we voted to make this situation as an error that must be reported by processors. One of the reasons was the lack of good use cases that make use of / in ]]>

redefinitions and order-significance

RELAX NG pattern is currently sensitive to the order of <define> element or order of <include> element because of the redefinition capability.

However, this sensitivity can be removed by restricting redefinition to only under <include> element (like XML Schema). But this restriction also limits the expressiveness of RELAX NG.

Should we introduce this restriction to make RELAX NG pattern order-insensitive language? Is this worth the cost of limiting language expressiveness?

( original posts ) The original post proposes to elements that redefine or combine with other definitions should go inside the elements that includes the pattern that contains the original definition." ]]> Another proposal made by jjc. attach a priority to definitions to allow combination without order dependence (like xsl:template). prohibit applying "different kinds of combine for a single pattern". The proposal #3 by jjc, which is amended a bit. combine="interleave"/"choice" can be used as status quo. combine="group" cannot be used. the functionality that implement the semantics of combine="replace" is introduced in order-insignificant way.

One of the touchstone will be XHTML modularization. kk wrote that the proposal #1 does not work and #2 does with XHTML m12n.

The proposal #3 with its amendment is adopted.

prohibiting duplicate attributes

RELAX NG allows patterns like


     ... 
     ... 

		]]>

Can we prohibit patterns like this? If so, how can we do that?

( the original post )

The GNF normalization can detect such a condition. However, since it is a time-consuming operation, it may not be suitable to mandate the enforcement of this constraint.

Murata-san proposed that "implementations MAY issue warning messages" by using the GNF normalization.

p1 p2 , "the set of possible attribute names occuring in p1 must be disjoint from those occuring in p2." (The present author believes that the same restriction is necessary for .) ]]>

Another proposal made by M-san introduces a new primitive <multipleAttributes> NC P </multipleAttribuets> that has the built-in "zero-or-more" semantics.

In the conference call of 2001/06/14, TC has voted to adopt JJC's proposal. See the algorithm. Overlapping with XML Schema Part 2

XML Schema Part 2 has capability to

create union of multiple types. create list of multiple types. assign a name to the type and refer to it by name.

But our language is also capable of doing above three.

So if we use XML Schema Part 2 as the only datatype vocabulary, we should consider dropping some of the redundant capability. (That is, restricting choices of <data>s, for example).

( the original post )

jjc suggests to close this with "no-action required" because he wants to keep a distance from XML Schema Part 2.

We decided not to use the syntax of W3C XML Schema Part 2 for defining datatypes. Therefore, the overlap no longer exists.

Prohibiting nested grammars

We currently allow <grammar> elements to be nested. That is, grammar can be used just like any other patterns.

Murata-san wants to prohibit this because it may interfere with future namespace-based modularization (as currently seen in RELAX and XML Schema).

( the original post )

"Namespace-based modularization" means that one module is responsible for one namespace. In my personal opinion (and probably Murata-san's), this is vital for multi-lingual validation, where multiple schema languages cooperates to validate one document.

Murata-san said he is willing to retract this if someone can convince him that nested grammar doesn't possibly interfere with such modularization.

Murata-san retracted his objection in 2001/6/7. This issue was then resolved as no-action-required.

Datatype and Identity Constraint

This issue is arose by merging several issues.

The first objective was to introduce the identity constraint functionality in our new language. Then we've found that this issue is related to how our language treats datatypes.

It starts with a series of posts by jjc that describes possible features [multipart key] , [typed key] , [scoped key] , [multiple key symbol spaces] , and [keys in element] .

Those posts are about possible features, but how those requirements affect the design is generally unclear. The editor believes that one thing that has developed in telcon is that we don't need any path expression if we abandon multipart keys.

In 5/3 telcon, we've made some degree of consensus about the above requirements (see minutes).

After the telcon, jjc posts his two proposals.

"ID/IDREF strawman #1". This proposal is relatively simple, but leaves several problems unresolved. Firstly, key is not typed. So "1" and "01" is considered different even when the type is integer. Secondly, it doesn't work well with anonymous types. Datatype libraries have to provide a capability to test the equality of two types. "ID/IDREF strawman #2". This is also a relatively simple proposal. But it still has several problems. Firstly, it doesn't work well with anonymous types. This time, you have to use built-in types.

So now it is discovered that without greater involvement to datatypes, we can't use anonymous (or user-defined) types in key/keyref. This discovery leads to another proposal from jjc.

"datatypes #1". This post proposes how to declare new datatypes under the control of our language and how to declare key/keyref constraint.

What's important here is "under the control of our language". RELAX NG allows datatype library(DTLIB) to use its own syntax to declare new types. But in this proposal, every DTLIB is required to use the syntax of this proposal (to make type equivalence test possible).

Datatypes #2 ("the proposal of the day"). Roughly speaking, this is a simplified version of "datatypes #1", which "I(jjc) hope will be able to command consensus."

The difference with the previous proposal is that this one doesn't have the concept of "derivation". That means you can't add facets to your type once you defined it.

Kohsuke KAWAGUCHI also proposes the most simple version.

"Back to the basic" proposal. This one tries to mimic DTD's ID/IDREF capability.

The above "datatypes #2" proposal is adopted. We use the following syntax to define enumeration:

And the following syntax to define a datatype:

For many other details, see the minutes of the conference call. (Not available at this moment.)

Namespace URI of RELAX NG

We need a namespace URI for the new language.

jjc suggests "http://relaxng.org/ns/m.n" where m.n is the version number.

M-san suggets "http://relaxng.org/ns/something/m.n" so that we can accommodate related namespace URIs. For "something", jjc suggests "structure".

In the 5/31/2001 conference call, several persons speak in favor of HTTP-based URI.

HTTP-based URI allows us to put something like RDDL document to the end point, which seems to be a general trend. If we'll take RELAX NG to ISO, then having an URI with "oasis" in it may be a little bit strange.

Also, "core" is proposed along with "main" and "structure".

http://relaxng.org/ns/structure/0.9 is choosen.

Restrict patterns to the regular language

TREX allows the following pattern.


   ... 
   ... 

]]>

In the computer science terminology, this is beyond the power of the "regular language". And therefore problematic for applications.

Shall we avoid this excessive expressiveness? If so, how?

The original post

In the above post, M-san suggests to restrict and ]]> to either

not contain <attribute> pattern directly/indirectly, or patterns made of <attribute> and <choice> only.

element has an descendant, it must not have a or descendant." ]]>

or is used under , then it cannot contain any ." ]]>

The TC decided to adopt the restriction by KK.

Use of QNames

RELAX NG uses QName to

designate datatypes, refer to element/attribute names, and accommodate QName datatype of W3C Schema Part2

But for some, use of QNames in this way is something they want to avoid.

Can we avoid using QNames? If so, shall we avoid using QNames? If so, how?

The original post

Eric proposed to declare the prefix-URI mappings in another independent way, as follows:

]]>

Here is the original post.

The editor believes Eric also had an alternative proposal, which write URIs every time, like this:

]]>

jjc suggests to introduce a new attribute "datatypeNamespace", which propagates like the "ns" attribute. This proposal will address the problem of using QNames for datatypes.

Murata-san opposes the use of QNames and proposes the following alternative solutions.

Another proposal made by jjc utilizes <div> elements to declare namespaces.

Several more proposals were also made. See the thread starting from here for details.

Some people (including the present editor, for the full disclosure) don't like to use QName in values for some reasons, including:

It interferes with canonicalization of XML. It makes it impossible to change the prefix without any knowledge about the contents.

On the other hand, QName is easier to write for humans, and less verbose. And some people think the use of QNames is unavoidable.

Also, TC seems to have a consensus that RELAX NG grammar should be able to be written without using XML Namespace, if the author prefers so (because for many people XML namespace is still a new technology.)

In 2001/06/07, TC has voted to retain the current status; that is, use @ns to specify the default namespace and allow element/attribute names to have QNames.

<![CDATA[Introduce <list> pattern to replace <oneOrMoreToken>/<zeroOrMoreToken>]]>

and ]]> are adopted to make lists of strings possible. But it is discovered that a new pattern, namely <list>, can clone the semantics of <***OrMoreToken> patterns and simplify both implementations and the spec.

The original post

The above original post contains the semantics of <list> pattern.

<***OrMoreToken> patterns do not allow us to have something like "a list of 4 integers", which is possible under W3C XML Schema. This proposal makes the list capability of RELAX NG more expressive than W3C XML Schema.

and . ]]>

Scope of key/keyref symbol spaces

Currently, the symbol spaces of key/keyref are global. So two independently-authored grammars may accidentaly use the same key name. Is there any solution to this?

The original post

Sometimes, an author wants to refer to keys that another author wrote. That makes restriction difficult.

TC is now waiting for the public comments.

Allow <include> to rename keys. Allow <include> to have a flag to ignore(localize) keys in the included pattern. Have some kind of explicit scoping for each key. (How?) Redefinition without original

inside ) when that redefined pattern is not defined in the included file? ]]>

Consider the following example:


  
  
     ... 
  

B.rng

   ... 


C.rng:

]]>

(Quoting from jjc's post) "If the user has done this, then they have probably made a mistake. On the other hand the semantics are clear. We can either make this an error or suggest that implementations give a warning."

TC has decided that such a redefinition has to be rejected. And an algorithm to detect this situation is available at the thread strating from here.

Multiple definitions in the same file TREX prohibits multiple definitions of the same pattern in the same file. That is, you can't write it like this:


   ... 
   ... 

]]>

Shall RELAX NG keep the same restriction, or not?

Quoting from jjc's post: " I found myself confused by this when reading RELAX grammars. I would find a reference to a label foo, then look for an elementRule foo; when I found it, I would assume it's the only definition. My assumption would be incorrect, and I would therefore misunderstand the schema (though eventually of course I would notice the other definitions and understand correctly). The other side of the argument is that this is an extra complication, and makes things slightly harder to explain. "

TC has decided to allow this (in 2001/6/7 conference call)

Context information for datatype libraries and ? If so, what should be included (in-scope namespaces, entity declarations, notations declarations)? ]]> <element> and <attribute> in <list>

s/s within a (may be it's not). We may want to restrict patterns that are allowed inside . ]]>

This issue should be considered along with the issue #19 and #21.

KK suggested to prohibit attributes inside list.

TC seems to have a consensus that these should be prohibited.

It is discovered that it is possible to modify the implementation to correctly process elements/attributes inside a list.

Due to the confusion arose by allowing elements/attributes inside a list, TC has voted to prohibit both elements and attributes inside list, in the conference call of 2001/06/14. allow non-whiteSpace delimiter for <list> The original post TC has decided not to incorporate this feature for Ver.1.0. renaming the "name" attribute. The original post TC has voted to close this issue with no-action-required (2001/06/14). Allowing multiple operands for <not>

Currently, <not> pattern can only have one operand, and p is considered as the syntax sugar of p]]>. Ms.Tennison suggests that we can change <not> to have multiple operands by modifying its definition as:

p1 p2 ...]]>

as equivalent of


  
  
    p1 p2 ...
  
]]>

This change is relatively easy because <not> is just a syntax sugar and it doesn't affect any other part of the spec. And as Ms.Tennison said, this "would be more convenient".

Resolution of the default namespace for XSD's QName datatype

Currently, our (conceptual) interface to datatype libraries is defined in such a way that the following pattern matches the following instance.



  
  foo


instance:

  rng:foo

]]>

Because the unqualified QName value is considered to have the namespace URI of the defefault namespace. In this case, that is the namespace URI of RELAX NG, and this behavior is probably not what people want.

If we modify the spec to resolve unqualified prefix to the value of "ns" attribute, instead of the default namespace URI of the pattern file, then the above pattern would match the following instance:


  foo

]]>

The other resolution would be simply to close this issue without no action required.