ubl-ndrsc message

Subject: RE: [ubl-ndrsc] Code lists: discussion kickoff
From: Matthew Gertner <matthew.gertner@schemantix.com>
To: "'Gregory, Arofan'" <arofan.gregory@commerceone.com>,'Phil Griffin' <phil.griffin@ASN-1.com>, ubl-ndrsc@lists.oasis-open.org
Date: Sun, 03 Feb 2002 15:48:56 +0100
The problem with this is that the codelists tend to be very long at the
schema level but much more restricted at the instance level (after
application of context). In other words, my currency list might contain 100
items, but in reality for my specific application only 5 are likely. So
using enumerations to generate forms directly from a schema with nice
dropdown lists for the codelists isn't all that advantageous. You still need
a mechanism for specifying which items from the huge grab bag of choices is
actually needed for the application. That's why I think it might be
worthwhile to reject the idea of enumeration altogether and just go with
appInfo. Maybe we could use our context mechanism to create more manageable
context-specific enumerations in the schema...

Matt

-----Original Message-----
From: Gregory, Arofan [mailto:arofan.gregory@commerceone.com] 
Sent: Thursday, January 31, 2002 10:58 PM
To: 'Phil Griffin'; ubl-ndrsc@lists.oasis-open.org
Subject: RE: [ubl-ndrsc] Code lists: discussion kickoff

Phil:

Let me try to answer your points in a general way:

First, when we talk about "code lists" I am assuming that we are restricting
ourselves (as we did in xCBL) to those lists of commonly used, well-defined,
externally-maintained "codes" that come from places like X12, ISO, and the
UN/CEFACT Codes Working Group. For xCBL, we harmonized these codes in some
cases, and in others, we chose to subset them for our own uses, but we have
clear maps back to the definitions commonly understood in business today.

We *cannot* use any controlled construction in UBL - be it an element or
attribute name, or a value in an enumerated list - that we do not in some
way completely and unambiguously define. Otherwise, we have failed in
creating a useful language for e-business.

In general, I agree with you - we *must* be unambiguous, using formal
references to that work of other bodies that we base ours on, if indeed we
choose to do this.

As for alphabetic constraints, XSD does give us the ability to do pattern
constraints called "regular expressions", so I think we could do what you
suggest, but a simple enumeration datatype will get us to the same place. I
don't think parsers yet support the regular expression stuff, although they
might.

As for validation, you do have a good point - few users ever support all of
the codes in a long code lists. But the validation issue depends on
something else. My mental picture of how SMEs will use this stuff has a
lower bound, which is that they view business documents in a browser, based
on a hosted application that can do only two simple things: (1) parse the
document against the schemas; and (2) run it through an XSL or CSS
stylesheet to produce a display form compatible with today's web-browser
technology. 

There are several companies - mine among them - that offer this type of
low-level, hosted functionality, and it is generally seen as the basic
replacement for FAX-based processes used by the EDI VANs, called "Rip &
Read". These applications - because they are generic, XML-based applications
- typically do not offer detailed functionality about the mappings between
sets of codes that are common in more fully automated EDI implementations. 

Because of this, I feel that being able to validate code lists with generic
XSD parsers is very important. So is limiting, to the extent practical, the
sets of enumerated values that people use to express semantics within UBL.

Cheers,

Arofan



-----Original Message-----
From: Phil Griffin [mailto:phil.griffin@ASN-1.com]
Sent: Thursday, January 31, 2002 12:03 PM
To: ubl-ndrsc@lists.oasis-open.org
Subject: Re: [ubl-ndrsc] Code lists: discussion kickoff




"Gregory, Arofan" wrote:
> 
> Folks:
> 
> I've thought a lot about this issue, and I believe the trade-off is this:
> 
> (1) Using elements to represent codes is one possibility, that gives us
the
> advantage of being able to validate a code from a controlled list. Also,
if
> we wrap these in a parent type, the list can be extended. (Ugly, but it
> works.) For companies that have expensive validation software to handle
> code-lists, this isn't a problem, but it is a problem for the little guys.
> We can get free code-list standardization and validation from this
approach,
> which I think is good. The down-side is that designing and maintaining
these
> code-lists is a bitch. (Many, many versions of our schemas that do nothing
> but update code-lists). Perhaps we could have special namespaces for
> codelists, and have special rules so that versioning is not done by
> namespace but with an attribute? Just a thought.

Just a point here. Code lists in themselves do not
always guarantee interworking applications. Unless
each code list item is bound to an unambiguous
textual definition there can still be problems.

Case in point, the characters "AML". When the notion
of using ASN.1 as an XML schema was first proposed, I
used these characters to describe our work. But when
we did a google search we found so many other uses of
these same characters, we switched to XER. 

So code lists can help in validation, but they may not
provide a 100% solution even when the list of codes is
fixed. And my guess is that the longer the list of codes,
and the greater the number of list users from different
disciplines, the more likely such problems will arise.

The result: you and I will both use AML, each of us with
a totally different meaning.

> (2) Using the "string" approach will absolutely defeat any hope of
> interoperability without benefit of expensive translation software. The
EDI
> experience has shown that people will happily invent their own
> non-interoperable codes. In xCBL we allowed for this with the "CodedOther"
> approach: all code lists have an enumeration of choices, and then a
sister
> element that holds a non-standard code. If you choose the "Other" code,
then
> you have to fill in the string. This approach is not, in my opinion, the
> best solution, but it may be the best we can do with XML Schema. Using
just

I agree. This approach while not perfect as you
say is at least a far more simple one than you
describe below. Can we go this way for version
one (for speed of work) and change our minds in
a later version to a more complex solution such
as you describe below without causing significant
problems?

> a string makes it not necessary to maintain codelists at all, but
sacrifices
> much of the benefit of having a UBL, in my opinion.

It does push the actual validation off to the 
application. But given the length of the code 
list examples I've seen, I wonder, if for a
given user whether all of the ones listed would
REALLY be valid for that user's application?

Seems to me, as an example, if I only ship to the
US and Canada, that for my document only USA and 
CAN might be valid out of the list of all country
codes. What benefit would I get from JAP and FRA 
being valid? 

When an actual instance document is created for a
UBL user, will we provide support for specifying
further granularity of code list constraints?
 
> (3) Codelists as enumerated data types. This is my preferred approach - a
> codelist is, in fact, an enumeration of specific semantics, and this
format
> makes it clear and easier to manage. What we need is an ability to extend
> these  (a major failing of XML schema).

I have an enumerated type in my favorite schema language,
but essentially its named values are treated as integers.

But I can also view code lists differently using what is 
termed a permitted alphabet constraint, a set of the sets
of characters that determine what is valid for an instance
of a given user defined type. 

This allows me to express the valid sets of characters that
can be used in a given field of some type, say as

   MyCodeList ::= UTF8String ("ABC" | "BAX", ... )

The "extension marker" ( ... ) instructs tools to also
expect other values not in the list, so I do not need to
code up an "Other" choice alternative. 

But I am almost certain that such permitted alphabet 
constraints do not exist in XSD.
 
> Let me suggest:
> 
> (1) Dedicated namespaces for codelists (one per codelist, or related group
> of codelists)
> (2) Alow these namespaces to be static - that is, not versioned.
> (3) Have a "version" associated with the codelist in a way that does not
> change the name of the namespace. (Could we use XSD "version" for this?)
> 
> This way, we could version our structures and our codelists separately.
> This models the best part of EDI, where it is common practice to update
> codelists versions within an older version of message structures. And all
> this, while not throwing away the ability to validate codelists with a
> parser.

This seems a reasonable approach. But how is interoperability
maintained when a code list item is removed? Are we affected if
an item with one meaning in code list version A is given another
meaning in code list version B? My question here is what happens
in terms of interoperability if you are using A and I am using B?

Phil

> To subscribe or unsubscribe from this elist use the subscription
> manager: <http://lists.oasis-open.org/ob/adm.pl>
> 

----------------------------------------------------------------
To subscribe or unsubscribe from this elist use the subscription
manager: <http://lists.oasis-open.org/ob/adm.pl>
Follow-Ups:
- Re: [ubl-ndrsc] Code lists: discussion kickoff
  - From: "Fabrice DESRE - FT.BD/FTRD/DTL/MSV" <fabrice.desre@francetelecom.com>