ubl-lcsc message

Subject: Re: [ubl-lcsc] XSD data types - was Re: [ubl-lcsc] Re: UBL 0.81 CCT draft-9-mod
From: Chin Chee-Kai <cheekai@softml.net>
To: Tim McGrath <tmcgrath@portcomm.com.au>
Date: Tue, 9 Sep 2003 13:12:24 +0800 (SGT)
On Mon, 8 Sep 2003, Tim McGrath wrote:

>>thanks, i just wanted to know that you recognised the problem. your
>>arguments add strength to the idea that it is very subjective about when
>>we need to restrict content formats.

I'm collecting more mod's to add to draft-9-mod-2.  Would
you want the attribute types for IdentifierType to be changed
to "xsd:normalizedString" instead?   That also brings the
question of whether the same should be done to CodeType's
attributes.  But then CodeType's content type is xsd:token....



>>we need to consider the following...
>>
>>1. w.r.t. systems architectures. what happens if the data validation of
>>the schema parser fails?  typically, it is the logic of the processing
>>application that knows how to deal with this - so why not let it do the
>>validation.

I agree that applications have to perform all the work about
the validity and correctness of data.  They face the end-user,
and so have to assume any validation process that have not been
already done on the lower layers.  As a crude extension of this
to a somewhat extreme end, if the lower layers did not provide some 
TCP network functions, the application would have to even
build in time-outs, resends, CRC error checks, etc to simulate
what TCP has already done at its level to transmit UBL files.

It's also rather subjective as to whether application should
all the work for syntax checking.  Application has to do semantic
checking for sure, but if the syntax *could* (not must) be
pre-filtered by lower layers, namely the schema validator,
then it could focus more on semantics.  As a side benefit,
schema validators themselves could be checked against published
specs.  So validations that are based on schema validators 
can enjoy some sort of agreed-upon syntax of how validations
should be done.

But from UBL's point of view, I think, we should probably look
at what benefits there are in terms of precise data format
specification and information interoperability (in terms of
interpretation), rather than doing something for the purpose 
solely to assist applications, as applications would do what
is necessary to "patch up" leaveages anyway.




>>2. do we need to make these decisions up-front?  could not specific
>>implementors  extend our schemas to add the facets necessary for this
>>type of validation if they wanted it?

True, I think most likely that would be the case.  I asked myself
whether do we expect users to be able to use CCT.xsd out-of-the-box.
My answer to me is somewhat not a convincing yes.  

Take "NumericType" and "PercentType" for instance.  They're both
exactly the same!   So the only contribution each gives to user
is the naming, that embodies some semantics about the nature of
the values.   But users might not find, for example, PercentType
immediately useful, since there's lack of specification as to
whether a contained value of "0.30" means 0.30% or 30%.
Similarly, NumericType is based on an abstract numerical space,
such that physical representations of very large quantities,
negative values, precision level, rounding modes, etc that
may or may not apply in specific contexts are not taken cared
of by NumericType alone.   So if a user always has to extend
from the types in CCT, they could also build their own local
type directly basing from XSD's simple types.

My take would be that each CCT type would have to be as specific
for its purpose as possible, because the thoughts that go into
making each type's specificity are then embedded into the
type's design to ensure instance data used in the same context
flow back and forth between writers and readers in the same
expected shapes (data range) and sizes (bit lengths).





>>3. if we accept this is a valid tactic then we should revisit the use of
>>XSD dateTime as well.

Or perhaps all of them :)



>>Maybe we are trying to be too smart here. the critical requirement for
>>UBL is for semantic clarity, not presentation of data.

While I agree with you on the first half, I'm not sure if
it should be done at the expense of data representation.
Besides, it is sometimes not clear where the boundary is.
For example, if a quantity is -1, is it:

- a semantic error (e.g. we cannot permit a delivery of 
  -1 piece of item), or
- a syntax error (invalid representation, possibly due to
  programming bug), or 
- no error (oh, it is ok because sender is indicating that 
  he's owing the recipient one piece of item)?

I'd think suitably tight data representation can help in
reducing semantic ambiguity.  If in the same example,
we have specified the type for quantity to be non-negative
integers, then there's no semantic doubt that the -1 value
is an error, a syntactic one in this case, and which is
an easier thing to check once we have specified on the data
representation.




Best Regards,
Chin Chee-Kai
SoftML
Tel: +65-6820-2979
Fax: +65-6743-7875
Email: cheekai@SoftML.Net
http://SoftML.Net/
References:
- XSD data types - was Re: [ubl-lcsc] Re: UBL 0.81 CCT draft-9-mod
  - From: Tim McGrath <tmcgrath@portcomm.com.au>