relax-ng message

Subject: Re: Nits
From: James Clark <jjc@jclark.com>
To: Murata Makoto <mura034@attglobal.net>, relax-ng@lists.oasis-open.org
Date: Wed, 18 Jul 2001 12:37:10 +0700

> 1)
>
> I would like to introduce a para before 6.1.
>
>   In our data model, the child sequence of an element does not have
>   two consecutive strings.  To define the semantics of <interleave>
>   (e.g, <interleave><text><ref name="a"/></interleave>), we allow
>   sequences of text and elements such that a string may immediately
>   follow another string.  However, our inference rule for <element>
> ensures   that the child sequence of an element does not contain two
>   consecutive strings.


I agree with the point that you are trying to make, but I don't think 6.1 
is the right place to make it.   <list> is the case where it's really vital 
to be able to have multiple consecutive strings.  I think there are 3 
places where it makes sense to say something:

- in 6.2, where m is introduced
- in 6.2.8, where the rule for <element> is given
- in 6.2.11, where the rule for <list> is given

> 2)
>
> 6.2 says "a set with a single member is considered the same
> as that member".  Do we realy need this?

> 10)
>
> In 6.2.12, key(...) and keyref(..) should return a singleton bag
> rather than a single key (reference).


With sequences, a sequence with a single member is considered the same as 
that member.  It would be a pain to lose this.  All the rules with strings 
would have to be careful about wrapping and unwrapping strings in singleton 
sequences.  So for sequences I want to keep things the way the are.

I also think it's better to treat our unordered collections (sets and bags) 
in a uniform way to ordered collections (sequences). If a single member set 
is distinguished from the member, then you would need to make attribute(), 
return a singleton set rather than an attribute, which would be (a) 
unintuitive and (b) different from element().

Thus, instead of make key() and keyref() return singleton bags, I would 
propose to say for bags that a bag with a single member is not 
distinguished from that member, just as we do for sets and sequences.


> 3)
>
> In the semantic rule of <element>, p ranges over "top" as well as
> "pattern", but 6.2 does not allow "top".

p does not need to range over top. If the schema contains <define 
name="ln"><element> nc <notallowed/> </element></define>, then the judgement

deref(ln) = <element> nc p </element>

will not be true.  That is exactly what's needed because <notAllowed/> 
doesn't match anything.

>
> 4)
>
> In 6.2.6., I would like to add an example:
>
>    For example, all interleavings of <a/><a/> and <b/>
>    are <a/><a/><b/>, <a/><b/><a/>, and <b/><a/><a/>.
>

Added.

> 5)
>
> I do not understand why we need "toString(v)" in 6.2.7.  In our data
> model(section 2), an attribute consists of a name and a string.

v can range over an empty sequence as well as a string.  If we didn't have 
toString, then the rule might incorrectly construct an attribute whose 
value is an empty sequence. If we used s instead of v in the rule, then

<attribute name="foo"><empty/></attribute>

would not match

foo=""

The stems from the fact that empty elements have an empty sequence as 
content, but empty attributes have an empty string as content, yet we want 
(I hope) to treat these uniformly. It's the converse of the (empty string) 
rule.

Perhaps a note would be in order here.

> 6)
>
> 6.2.8
>
> I would propose to replace "normalized(m)" with
> "not(hasConsecutiveStrings(m))" and introduce "hasConsecutiveStrings"
> rather than "normalized".

But normalized doesn't just mean "does not have consecutive strings". It 
also means "does not have empty strings". (I agree we do need to change 
"normalized(m)".)
>
> 7)
>
> In 6.2.8 "stripSpace" removes whitespace strings.  Thus, we cannot
> validate strings comprising whitespaces against foreign datatypes.
>
> For example,
>
> <foo>  </foo>
>
> does not match
>
> <element name="foo">
>   <data type="xsd:string">
>     <param name="minLength">2</param>
>  </data>
> </element>
>
> Is this a problem?

It's certainly not ideal, but it's probably not a problem in practice.  I 
think it's going to hard to fix this given that we have decided to allow

<element name="foo"><empty/></element>

to match

<foo>   </foo>

For example, consider

<choice>
  <element name="foo">
    <data type="string">
      <param name="minLength">2</param>
    </data>
  </element>
  <element name="foo">
    <empty/>
  </element>
</choice>

Should this match

  <foo> </foo>

with just one space?

It's worth thinking about some more to see if we can find a better solution.

> 8)
>
> Should we require that the identity, transitivity, and reflexivity
> hold for datatypeEqual?  If reflexivity does not hold, testing of
> keyConflict will become expensive.

Good point.  I think we should probably require that datatypeEqual be 
symmetric, reflexive and transitive.  Actually it's not quite going to 
reflexive: it's only going to be reflexive for things for which 
datatypeAllows() is true.  We can add 3 inference rules to specify 
precisely what must be the case.

> 9)
>
> What is the definition of "identical" in 6.2.10?  The same sequence
> s of unicode code values?

The inference rule notation is assuming a notion of identity.  Any time an 
inference rule with more than one instance of a particular variable, it is 
implicitly relying on a notion of identity. 6.2.10 is relying on that same 
notion of identity.  For strings, it does indeed mean it is the same 
sequence of characters.

It would probably be good to say something about this identity issue, but I 
am not quite sure what. Something like that the data model defines a 
value-based notion of identity for any two objects of the same kind; two 
objects of a particular kind are the same if their constituents are the 
same.

>
> 11)
>
> Have we decided that we allow QNames as values of key/keyref?  I
> remember that the original XML WG did not allow qualified names for
> ID/IDREF since two islands (e.g ., tables) in a docu m ent m ay belong
> to the same namespace.

We decided to allow key/keyRef values to have ANY datatype. How could we 
exclude QNames when RELAX NG knows nothing of particular datatypes?

> 12)
>
> In the first para of 6.3, I would like to replace "has" with
> "contains".

OK.

> 13)
>
> I think that the second condition of the rule for keyConflict should
> have "(" before the first "key" and ")" before "subset".

I agree.

> 14)
>
> In XML 1.0, characters are defined as follows.
>
> [2]    Char    ::=    #x9 | #xA | #xD | [#x20-#xD7FF] |
> [#xE000-#xFFFD] | [#x10000-#x10FFFF]
>
> I think that we should merely reference to this definition.

I changed

"a character is an integer in the range 0 to #x10FFFF"

to

"a character is as defined in [XML 1.0]"

James
References:
- Re: Nits
  - From: Murata Makoto <mura034@attglobal.net>