ubl message
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
| [List Home]
Subject: Initial Comments on UN/CEFACT ATG2 Core Component Schema Module
- From: Tim McGrath <tmcgrath@portcomm.com.au>
- To: ubl@lists.oasis-open.org
- Date: Wed, 01 Sep 2004 15:41:45 +0800
I was ask to co-ordinate any UBL comments to be submitted to UN/CEFACT as
part of the review of their XML Naming and Design Rules. In particular,
how the schema modules for Core Components and Unqualified Data Types compared
to those used by UBL.
My initial comments are posted here to encourage debate within the UBL TC
- whether or not we submit a response and what that may say is to be decided
by the teleconference calls on September 8th/9th. I have only addressed
the ATG2 Core Component Schema - presumably to Unqualified Data Type Schema
will have these issue plus others.
In addressing these schemas, it has to be conceded that some basic XML Naming
and Design Rules differ (and apparently always will) between ATG2 and UBL.
For example, the use of global and local type definitions. This means that
we are not aiming for compatibility with schemas built using ATG2 rules -
we are aiming for interoperability. That is, can a document whose schema
is expressed in ATG2 form be mappable to components (elements and attributes)
used by UBL.
The following outlines the areas that need to be addressed for this interoperability
to be achieved.
Naming Rules
--------------
The difference in NDRs between ATG2 and UBL manifests itself most significantly
in the naming of attributes.
The ATG2 naming rules [R 117 and R 133 duplicate each other] state ..
"Each supplementary component xsd:attribute "name" MUST be the supplementary
component name with the separators and spaces removed. "
UBL's Naming and Design Rule [ATN1] originally stated..
"Each CCT:SupplementaryComponent xsd:attribute “name” MUST be the ccts:SupplementaryComponent
dictionary entry name property term and representation term, with the separators
removed." - but this has been reviewed (see under point 2. of this section).
In their implementation these difference impact in three significant ways.
1.UBL have adopted abbreviations for "Identifier" (which must appear as "ID")
and "Uniform Resource Identifier" (which must appear as "URI").
Any attributes that contain these abbreviations will have different names.
However these are obviously interoperable as we should be able to map one
name to the other. Unfortunately, the given ATG2 Core Component schema does
not adhere to the ATG2 rule (neither do the fragment samples in the main
body agree with the final schemas- but we can assume the final schemas are
what was intended), for example:
<xsd:attribute name="amountCurrencyID" type="xsd:token" use="optional">
<xsd:attribute name="amountCurrencyCodeListVersionID" type="xsd:token"
use="optional">
<xsd:attribute name="codeListID" type="xsd:token" use="optional">
<xsd:attribute name="codeListAgencyID" type="xsd:token" use="optional">
<xsd:attribute name="codeListVersionID" type="xsd:token" use="optional">
<xsd:attribute name="codeLanguageID" type="xsd:language" use="optional">
<xsd:attribute name="identificationSchemeID" type="xsd:token" use="optional">
<xsd:attribute name="identificationSchemeAgencyID" type="xsd:token" use="optional">
<xsd:attribute name="identificationSchemeVersionID" type="xsd:token" use="optional">
<xsd:attribute name="measureUnitCodeListVersionID" type="xsd:token" use="optional">
<xsd:attribute name="quantityUnitCodeListID" type="xsd:token" use="optional">
<xsd:attribute name="quantityUnitCodeListAgencyID" type="xsd:token" use="optional">
<xsd:attribute name="languageID" type="xsd:language" use="optional">
<xsd:attribute name="languageLocaleID" type="xsd:token" use="optional">
- should all have the letters "ID" replaced by "Identifier", and...
<xsd:attribute name="binaryObjectURI" type="xsd:anyURI" use="optional">
<xsd:attribute name="codeListSchemeURI" type="xsd:anyURI" use="optional">
<xsd:attribute name="identificationSchemeDataURI" type="xsd:anyURI" use="optional">
<xsd:attribute name="identificationSchemeURI" type="xsd:anyURI" use="optional">
- should all have the letters "URI" replaced by "Uniform ResourceIdentifier",
and...
<xsd:attribute name="codeListUniformResourceID" type="xsd:anyURI" use="optional">
- should have the letters "UniformResourceID" replaced by "Uniform ResourceIdentifier".
2. UBL truncates redundant Object Class in names.
UBL has realized that rule [ATN1] is not adequate to define an attribute
name. This is because merely using property term and representation term
for an attribute's name will not make it unique. For example, CodeType would
have two attributes called "name" and two called "URI". The solution UBL
adopted was based on Gunther's position paper of April 2002 (http://www.oasis-open.org/apps/org/workgroup/ubl/ubl-ndrsc/download.php/1505/draft-stuhec-nameTrun-01.doc):
• If a BBIE (Basic Business Information Entity)
defined in a ABIE (Aggregated Business Information Entity) with the same
“Object Class Term” and same “Object Class Qualifier”, that this “Object
Class Term” can be truncated from the BBIE
In effect, UBL has abbreviated the ATG2 rule [R 117/133]. So that CodeType
would have one "name" attribute (for the Code. Name) and one "codeListName"
(for the Code List. Name). Again, this makes mapping between the two straightforward.
However, there is one exception. The current ATG2 schema adds a new Object
Class (or perhaps qualifies the Object Class) for the attribute LanguageID
- so it is known to ATG2 as codeLanguageID and to UBL as languageID. this
mapping cannot be assumed from the current CCTS.
3. UBL applies an additional naming rule when the Representation term is
"text".
UBL has had a long standing rule ...
7.(b) The representation term “Text” will be considered
the default representation term when a representation term does not appear.
[NB this rule has not made it into the latest NDR rules despite being listed
as approved at the plenary meeting in May 2003 (http://lists.oasis-open.org/archives/ubl-ndrsc/200201/doc00005.doc)].
This rule means that the attribute known to ATG2 as "codeListAgencyNameText"
is known in UBL as "codeListAgencyName". The same applies to...
<xsd:attribute name="binaryObjectFormatText" type="xsd:token" use="optional">
<xsd:attribute name="binaryObjectFilenameText" type="xsd:token" use="optional">
<xsd:attribute name="codeListNameText" type="xsd:token" use="optional">
<xsd:attribute name="codeNameText" type="xsd:token" use="optional">
<xsd:attribute name="dateTimeFormatText" type="xsd:token" use="optional">
<xsd:attribute name="identificationSchemeNameText" type="xsd:token" use="optional">
<xsd:attribute name="identificatonSchemeAgencyNameText" type="xsd:token"
use="optional">
<xsd:attribute name="indicatorFormatText" type="xsd:token" use="optional">
<xsd:attribute name="numericFormatText" type="xsd:token" use="optional">
<xsd:attribute name="quantityUnitCodeListAgencyNameText" type="xsd:token"
use="optional">
- and also to all annotation/documentation attributes that are represented
by text fields.
Once again, because this a regular rule, these attribute names could be mapped
between UBL and ATG2.
XSD Data Types
------------------
Not so easily mapped are the differences between the use of XSD data types
for element and attribute values. There are three primary differences:
1. ATG2 is more restrictive than UBL for the values permitted in the following
data types:
CodeTypes and IdentifierTypes are expressed as xsd:token. UBL has defined
them as xsd:normalizedString.
TextTypes and NameTypes are expressed as xsd:token. UBL has defined them
as xsd:string.
Just to remind myself (again) I looked up the definitions
of these in the XSD specs (http://www.w3.org/TR/2001/REC-xmlschema-2-20010502/).
In XSD, a token is the set of strings
that do not contain the line feed (#xA) nor tab (#x9) characters, that have
no leading or trailing spaces (#x20) and that have no internal sequences of
two or more spaces. A normalizedString
is the set of strings that do not contain the carriage return (#xD), line
feed (#xA) nor tab (#x9) characters. An xsd:token is derived from
an xsd:normalizedString. A string is the set
of finite-length sequences of characters. An
xsd:normalizedString is derived from an xsd:string.
This means instances of UBL code, identifier, text and name values may not
be legitimate for applications basing their received data on ATG2 data types.
As we discussed when this came up in UBL last year, the uses of xsd:token
also means that documents would not accept values such as "A SHADOW ON THE
GLASS:VOL 1 A VIEW FROM THE MIRROR- IRVINE, IAN" or "VIRAGO BOOK OF SPIRITUALITY
O- ANDERSON, SARAH" - both of which are real examples from a book publisher's
EDI ordering system worked on. I have also seen "- ALL ITEMS ON THIS OFFER
TO PURCHASE TO BE SUPPLIED SUBJECT TO: CONDITIONS APPENDED TO THE
EDI TRADING AGREEMENT BETWEEN US.:SHUTDOWN REPLACEMENTS:DELIVERY TIMES ARE
7:30AM TO 3:00PM, TUESDAY TO FRIDAY." used in the automative industry. I
am not sure why anyone would want to prevent this type of content.
As far as codes and identifiers goes, UBL has also decided that preventing
leading, trailing or duplicate embedded spaces is too prohibitive. Again,
i don't have to look far to see where this wont work. If we take the example
of Australian State Codes, these are 3 characters. New South Wales is "NSW"
and Victoria is "VIC", but Western Australia is "WA " not " WA" or "WA".
Real application systems are built around the concept that spaces can be
a legitimate part of a code or an identifier, so they will exchange these
spaces wherever they appear in the data.
The principle of trying to enforce this type of content validation in a core
component schema is bound for problems. Which is why UBL have settled on
xsd:normalizedString and xsd:string. It is a similar argument to why we
err on the side of making BIEs optional rather than mandatory - let the customization
that comes with implementation define things like presentational content
- not the core schemas.
2. NumericType is a complexType whose values are expressed as xsd:decimal
with an extended attribute for numericFormatText. UBL defines NumericType
as a simpleType using xsd:decimal.
This means instances of ATG2 numeric values may have formatting instructions
that UBL-based applications do not expect (even though the ATG2 comment discourages
using this attribute).
3. UBL uses more built-in XSD data types.
In UBL we have the following rules:
[GXS3] Built-in XSD Simple Types SHOULD be used wherever possible.
[CTD7] For every ccts:CCT whose supplementary components are not equivalent
to the properties of a built-in xsd:datatype, the ccts:CCT MUST be defined
as a named xsd:complexType in the ccts:CCT schema module.
[CTD10] Each CCT:SupplementaryComponent xsd:attribute "type" MUST define
the specific xsd:built-in Datatype or the user defined xsd:simpleType for
the ccts:SupplementaryComponent of the ccts:CCT.
This means:
In UBL, DateTimeType is a simpleType expressed as an xsd:dateTime. In ATG2
schemas, DateTimeType is a complexType expressed as an xsd:string with an
extended attribute for dateTimeFormatText.
In UBL, IndicatorType is a simpleType expressed as an xsd:boolean. In ATG2
schemas, IndicatorType is a complexType expressed as an xsd:string with an
extended attribute for indicatorFormatText.
This means instances of ATG2 date time or indicator values may be formatted
in a way that UBL-based applications do not expect. They may also have additional
formatting instructions.
Overall I see this as coming down to two issues...
1. The majority of concerns come down to a different set of NDRs. Do we want
ATG2 to amend their rules to fit with UBLs? Presumably we have had as much
(if not more) input into the ATG2 rules as anyone. Therefore, ATG2 has deliberated
and chosen different rules to UBL. I see little point in re-submitting our
rules to them again.
2. The remaining issues relate to choice of XSD datatypes. UBL has a proposal
to align our use of datatypes with OAGIS 9.0.
OAG consider the built-in XSD type,"normalizedString", for all code,
identifier and text components (where there is no specific built-in type,
such as "language").
UBL consider the built-in XSD type,"normalizedString", for all text
components (where there is no specific built-in type, such as "language").
This feels to me like a more appropriate strategy than using xsd:token everywhere and could be accomodated into UBL1.1 with some caution for existing document instances.
2.
--
regards
tim mcgrath
phone: +618 93352228
postal: po box 1289 fremantle western australia 6160
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
| [List Home]