ubl-ndrsc message

Subject: Re: [ubl-ndrsc] Absence of Data
From: "Eve L. Maler" <eve.maler@sun.com>
To: ubl-ndrsc@lists.oasis-open.org
Date: Tue, 20 Nov 2001 16:58:45 -0500
(What is PESC?)

Thanks for sending this!  I assume I can consider you the champion for this 
one...  I doubt we'll have time to discuss this in tomorrow's call, but 
maybe we can get some reaction on the list for next time.  I have some 
comments below:

At 02:48 PM 11/20/01 -0600, Mike Rawlins wrote:
>One thing that we'll have to deal with, particularly for those coming
>from an EDI background, is that absence of data (or absent elements) are
>not handled the same way.  For example:
>
>In XML:
>
><Name>Mike</Name>
><Name>     </Name>
><Name/>
>
>All satisfy the condition that the "Name" element be present, even
>though there may be no real data in it.
>
>In X12 or EDIFACT:
>
>N1*ST*Rawlins *91*1234567 - Element is present because data is present
>N1*ST**91*1234567 - Element is not present because data is not present.
>
>The PESC paper has a section which deals with this.  This may be a bit
>long for our purposes, but I think it's something that needs to be
>addressed.
>
>Extract from PESC paper:
>
>
>2.3.9 Nulls, Zeroes, Spaces, and Absence of Data
>The following rules SHALL apply in designing schemas and interpreting
>instance documents:
>
>1. Absence of data - If an element is defined as OPTIONAL (minOccurs
>attribute value of zero) and the element does not occur in an instance
>document, semantics SHALL NOT be interpreted from the element other than
>
>that the originator of the instance document and did not include it.  No

Try as I might, I can't make sense of the last bit about the 
originator.  Is there a word missing or extra?  Also, the notion of 
"semantics being interpreted" sounds a bit fuzzy to me.  Would it be 
clearer to say something like "The absence of an optional element in an 
instance SHALL NOT be interpreted as a signal that the element, if present, 
would have had a null value" or something like that?  A concrete example of 
how an absent element could be misinterpreted would be helpful.

Also, we should separate out the schema-design advice from the generic 
advice to those who have to interpret an instance that uses a schema 
designed according to our guidelines.  In fact, the interpretation advice 
should perhaps offer brief documentation boilerplate that should be 
attached to all elements that are in this situation; after all, it's not 
advice about how to properly structure an instance (like our advice about 
processing instructions etc. will be), but rather how to properly 
understand it.

I get the feeling that I'm not being clear, but I'll rely on you to tell 
me. :-)

>default values are to be assumed.   Likewise, if an attribute is
>declared as OPTIONAL ("use" attribute value of OPTIONAL) and the
>attribute does not occur in an instance document, semantics SHALL NOT be
>
>interpreted from the attribute other than that the originator and did
>not include it; no default values are to be assumed.

Same problem with the originator wording here.

>NOTE:  All string items defined with a minOccurs of one SHALL have a
>minimum length requirement of one character.
>
>2. Zeroes - Zeroes, when appearing in a numeric element in an instance
>document, SHALL be interpreted as a zero value.

Should we qualify numeric by saying listing the built-in datatypes this 
applies to and saying it also applies to any datatypes derived from them?

>3. Spaces - Spaces sent as values for elements or attributes (of type
>string) in instance documents SHALL be interpreted as spaces.  It is
>RECOMMENDED that leading and trailing spaces be removed, but when they
>appear they SHALL have semantic significance.  Sending an element with
>just spaces is not the same as sending a nulled element (see #4 below).
>
>4. Nullability - In certain cases, it MAY be desirable to convey that an

I don't believe that this is a legitimate use of the uppercase MAY (it's 
not being used in a normative sense).  I usually use "might" in these 
circumstances, or it could say "Where a schema is designed to be nillable, ..."

>element has no value (a null value) rather than indicating that it has a
>
>value of spaces or that it is not present in a document.  In these
>cases, the originator of the instance document SHOULD convey explicitly
>that an element is null.  An example is an address update for a
>previously transmitted address.  The previous address had two address
>lines, whereas the current address has just one line.  The originator of
>
>the document indicates that the second address line is removed by
>indicating that the element is nulled as follows:
><addressLineTwo xsi:nill="true"></addressLineTwo>
>
>To support this the addressLine element in the schema is defined as
>nullable via:
>
><xsd:element name="addressLine" type="xsd:string" nillable="true"/>
>
>When this type of nullable semantics are desired, the "nill" and
>"nillable" attributes SHALL be used (as opposed to spaces for strings or
>
>zeroes for numerics).   The "nillable" attribute SHALL NOT be used in
>element declarations with a minOccurs of greater than zero.  When there
>is a requirement that an element be OPTIONAL and not appear in an
>instance document, the minOccurs attribute with a value of zero SHALL be
>
>used in the element declaration.  By default, any element defined in
>analysis as having a minimum occurrence of zero SHALL be represented in
>the schemas as nullable.

To be honest, the whole nillable thing is new to me, and I recall that the 
XSD working group was sharply divided on it (with only the relational DB 
people on the "for" side).  Does it really buy us anything over 
application-specific token values like "none" or "no" or whatever?

         Eve
--
Eve Maler                                    +1 781 442 3190
Sun Microsystems XML Technology Center   eve.maler @ sun.com
Follow-Ups:
- Re: [ubl-ndrsc] Absence of Data
  - From: Mike Rawlins <mike@rawlinsecconsulting.com>
References:
- [ubl-ndrsc] Absence of Data
  - From: Mike Rawlins <mike@rawlinsecconsulting.com>