ubl-ndrsc message

Subject: Re: [ubl-ndrsc] Code lists: discussion kickoff
From: Phil Griffin <phil.griffin@ASN-1.com>
To: ubl-ndrsc@lists.oasis-open.org
Date: Thu, 31 Jan 2002 15:03:16 -0500


"Gregory, Arofan" wrote:
> 
> Folks:
> 
> I've thought a lot about this issue, and I believe the trade-off is this:
> 
> (1) Using elements to represent codes is one possibility, that gives us the
> advantage of being able to validate a code from a controlled list. Also, if
> we wrap these in a parent type, the list can be extended. (Ugly, but it
> works.) For companies that have expensive validation software to handle
> code-lists, this isn't a problem, but it is a problem for the little guys.
> We can get free code-list standardization and validation from this approach,
> which I think is good. The down-side is that designing and maintaining these
> code-lists is a bitch. (Many, many versions of our schemas that do nothing
> but update code-lists). Perhaps we could have special namespaces for
> codelists, and have special rules so that versioning is not done by
> namespace but with an attribute? Just a thought.

Just a point here. Code lists in themselves do not
always guarantee interworking applications. Unless
each code list item is bound to an unambiguous
textual definition there can still be problems.

Case in point, the characters "AML". When the notion
of using ASN.1 as an XML schema was first proposed, I
used these characters to describe our work. But when
we did a google search we found so many other uses of
these same characters, we switched to XER. 

So code lists can help in validation, but they may not
provide a 100% solution even when the list of codes is
fixed. And my guess is that the longer the list of codes,
and the greater the number of list users from different
disciplines, the more likely such problems will arise.

The result: you and I will both use AML, each of us with
a totally different meaning.

> (2) Using the "string" approach will absolutely defeat any hope of
> interoperability without benefit of expensive translation software. The EDI
> experience has shown that people will happily invent their own
> non-interoperable codes. In xCBL we allowed for this with the "CodedOther"
> approach: all code lists have an enumeration of choices, and then a  sister
> element that holds a non-standard code. If you choose the "Other" code, then
> you have to fill in the string. This approach is not, in my opinion, the
> best solution, but it may be the best we can do with XML Schema. Using just

I agree. This approach while not perfect as you
say is at least a far more simple one than you
describe below. Can we go this way for version
one (for speed of work) and change our minds in
a later version to a more complex solution such
as you describe below without causing significant
problems?

> a string makes it not necessary to maintain codelists at all, but sacrifices
> much of the benefit of having a UBL, in my opinion.

It does push the actual validation off to the 
application. But given the length of the code 
list examples I've seen, I wonder, if for a
given user whether all of the ones listed would
REALLY be valid for that user's application?

Seems to me, as an example, if I only ship to the
US and Canada, that for my document only USA and 
CAN might be valid out of the list of all country
codes. What benefit would I get from JAP and FRA 
being valid? 

When an actual instance document is created for a
UBL user, will we provide support for specifying
further granularity of code list constraints?
 
> (3) Codelists as enumerated data types. This is my preferred approach - a
> codelist is, in fact, an enumeration of specific semantics, and this format
> makes it clear and easier to manage. What we need is an ability to extend
> these  (a major failing of XML schema).

I have an enumerated type in my favorite schema language,
but essentially its named values are treated as integers.

But I can also view code lists differently using what is 
termed a permitted alphabet constraint, a set of the sets
of characters that determine what is valid for an instance
of a given user defined type. 

This allows me to express the valid sets of characters that
can be used in a given field of some type, say as

   MyCodeList ::= UTF8String ("ABC" | "BAX", ... )

The "extension marker" ( ... ) instructs tools to also
expect other values not in the list, so I do not need to
code up an "Other" choice alternative. 

But I am almost certain that such permitted alphabet 
constraints do not exist in XSD.
 
> Let me suggest:
> 
> (1) Dedicated namespaces for codelists (one per codelist, or related group
> of codelists)
> (2) Alow these namespaces to be static - that is, not versioned.
> (3) Have a "version" associated with the codelist in a way that does not
> change the name of the namespace. (Could we use XSD "version" for this?)
> 
> This way, we could version our structures and our codelists separately.
> This models the best part of EDI, where it is common practice to update
> codelists versions within an older version of message structures. And all
> this, while not throwing away the ability to validate codelists with a
> parser.

This seems a reasonable approach. But how is interoperability
maintained when a code list item is removed? Are we affected if
an item with one meaning in code list version A is given another
meaning in code list version B? My question here is what happens
in terms of interoperability if you are using A and I am using B?

Phil

> The down-side, of course, is that codelists are in a special class in terms
> of how they are versioned and use namespaces, but I don't think it will be
> that confusing - if they weren't special, we wouldn't be having this
> discussion. And this approach is, after all, very much a part of the
> existing EDI standards culture.
> 
> Cheers,
> 
> Arofan
> 
> -----Original Message-----
> From: Eve L. Maler [mailto:eve.maler@sun.com]
> Sent: Thursday, January 31, 2002 10:26 AM
> To: ubl-ndrsc@lists.oasis-open.org
> Subject: RE: [ubl-ndrsc] Code lists: discussion kickoff
> 
> At 11:46 AM 1/31/02 -0500, CRAWFORD, Mark wrote:
> > > Finally, regarding the "enumerations of the xsd:string type" vs. "one
> > > element per code list value" choice itself, I'm not sure I
> > > buy the argument
> > > that the latter is better.  It could potentially swell the
> > > number of UBL
> > > elements by orders of magnitude, and the infrastructure
> > > needed to document
> > > and manage elements would seem to outweigh the benefits for
> > > these little
> > > values.
> >
> >Not sure I understand.  Could you expand pls.
> 
> My (probably flawed) understanding of the main issue covered in Mike's
> position paper was that he was advocating <vanilla/>, <chocolate/>,
> <strawberry/> elements rather than <IceCream flavorCode="vanilla">...  If
> there are really thousands of values in some code lists, I quail at the
> thought of having to maintain definitions and documentation for all those
> elements.
> 
>          Eve
> 
> --
> Eve Maler                                    +1 781 442 3190
> Sun Microsystems XML Technology Center   eve.maler @ sun.com
> 
> ----------------------------------------------------------------
> To subscribe or unsubscribe from this elist use the subscription
> manager: <http://lists.oasis-open.org/ob/adm.pl>
> 
> ----------------------------------------------------------------
> To subscribe or unsubscribe from this elist use the subscription
> manager: <http://lists.oasis-open.org/ob/adm.pl>
References:
- RE: [ubl-ndrsc] Code lists: discussion kickoff
  - From: "Gregory, Arofan" <arofan.gregory@commerceone.com>