ubl-lcsc message

Subject: Re: [ubl-lcsc] Re: UBL 0.81 CCT draft-9-mod
From: Chin Chee-Kai <cheekai@softml.net>
To: Tim McGrath <tmcgrath@portcomm.com.au>
Date: Sun, 7 Sep 2003 19:59:03 +0800 (SGT)
On Sun, 7 Sep 2003, Tim McGrath wrote:

>>what is the problem with making code and identifier (and every other
>>data type) as 'string'? 

(Currently in 0.81 CCT draft-9-mode, CodeType's content type is
xsd:token, while IdentifierType's content type is xsd:normalizedString)

I suppose it's not a question of whether the system will break
down if we too narrowly restrict the type base (such as making
CodeType an xsd:token as opposed to xsd:string),  but the way
I look at it, a question of to what extent we can leverage
on Stage (B)'s schema-validation stage to filter out what
might be an "easily" filtered-off syntactical problem with
the data in the instance space.

Let's just look at an example.  If an application receives an 
element that is supposed to be CodeType, it expects it to contain
proper code values, such as a string without CR, LF, TAB
and without initial and trailing spaces.  A code value of,
say, "   UN/CEFACT   " may look the same on a printout as
"UN/CEFACT", but they compare differently in memory.  Thus,
if CodeType has xsd:string as base type, Stage (B) will
pass off both values are "OK" to the application, which must
EITHER filter off again the initial and trailing and find that
both values are alright (an action that xsd:token would have 
required the sender to perform), OR check and flag the first as 
incorrect while the second is acceptable.

On the other hand, if CodeType has base type xsd:token,
then the sender first of all cannot generate "   UN/CEFACT  "
if it should interop with other systems (as schema shows
that CodeType should be xsd:token).  On the receiving end,
Stage (B) will flag this field as erraneous as it does not
validate against a base type of xsd:token, saving application
from further syntactical checks (and focus on whether
the values are semantically correct).


>>what are we trying to gain by enforcing patterns
>>in the data?

Similar reasons, tapping on what schema-validator could
already provide to ensure the values are syntactically
right before processing.  Again, this is not a make-or-break
issue, but if we provide a lax type such as changing all
to xsd:string, then applications will just have to duplicate
some of the checking functions and work harder to find out
if values are ok.

For certain system- or processing- related types, such
as GloballyUniqueIDType, I'd think the stricter the pattern,
the better in terms of lesser chance of misinterpretation
and accomodative processing.  This GloballyUniqueIDType
specifies the wire-format for representing a GUID, which
is a consecutive 128bit of ID.  This is like (but cannot
compared with) having ISO8601 to specify representation
of what might be a conceptual form of date & time.
The example given in the draft CCT was:

2B93C220-E0C2-11D7-94FC-00E0290FEEC7

But without a pattern, sender could send in various forms:
2B93C220:E0C2:11D7:94FC:00E0290FEEC7
2B93C220 E0C211D794FC 00E0290FEEC7
2B93 C220 E0C2 11D7 94FC 00E0 290F EEC7
2B93C220E0C211D794FC00E0290FEEC7

etc.  They're all intended to mean a GUID value, but
either the receiving application has to be very smart
and accomodative, or the sender has to anticipate in
advance to what system expecting which sort of GUID
format it will be sending.  I think either case is
not something edging closer to interoperability than
upfront specification of a clear format.

Just my opinions.



Best Regards,
Chin Chee-Kai
SoftML
Tel: +65-6820-2979
Fax: +65-6743-7875
Email: cheekai@SoftML.Net
http://SoftML.Net/
References:
- Re: UBL 0.81 CCT draft-9-mod
  - From: Tim McGrath <tmcgrath@portcomm.com.au>