ubl message

Subject: Code list discussion so far
From: jon.bosak@sun.com
To: ubl@lists.oasis-open.org
Date: 10 Aug 2005 20:49:06 -0000
Hello UBL TC,

We held the first of two discussions to resolve the code list
issue this morning; the second of the two will take place Thursday
afternoon (1 p.m. in Ottawa).  Preliminary outcomes are as
follows.

 - There are serious use cases that require modifications to code
   lists in the interval between official revisions of code lists.
   This is especially true in the case of industry-specific code
   lists.

 - Solutions that require the namespaces in the UBL schemas to be
   changed when a code list is modified are very expensive.

 - There appear to be three ways to accomplish modifications to
   UBL schemas without changing the namespaces:

   1. Users simply modify the file containing the code list while
      leaving everything else alone.  This method is being used
      successfully in Denmark.  Obviously we cannot prevent users
      from doing this, and given a proper notification procedure,
      it seems to work pretty well.

   2. We explicitly enable modifications to the code lists by
      embedding a "substitution group hook" in the UBL schemas as
      described by Tony Coates and Marty Burns.  While cleaner
      from a conceptual point of view, we're finding it difficult
      to see any big advantage of this approach over simply
      swapping out one code list module and replacing it with
      another one.  The basic notification and management issues
      appear to be about the same.

   3. We take a radically different view of the problem by
      distinguishing between two kinds of code lists:

      a. Code lists that define codes used only in UBL (status
         codes, for example).  Such lists are typically
         well-defined, are completely under our control, and are
         not (or should not be) extensible.

      b. Code lists that are defined by outside agencies and
         referenced in UBL.  These are conceptually distinct from
         the first category even if some happen to be bundled into
         the UBL package.

      Making this distinction would allow us to take two different
      approaches to code list definition.  Code lists of the first
      kind could be defined in schema modules using enumerations
      just as we do in 1.0.  Code lists of the second kind could
      be defined in XML instances of a standard code list schema,
      with the codes of this kind declared as unrestricted
      strings.  Ordinary XSD validation would be used for the
      first kind of code list, just as in 1.0, whereas validation
      of the second kind would typically take place in a second
      validation phase using something like Schematron.

      Participants in the discussion noted the following points
      regarding this third alternative:

       - Publishing standard code lists as instances of a standard
         code list schema is much closer to the basic XML concept
         than publishing code lists as schema fragments.  In fact,
         the whole namespace problem we've been wrestling with
         here can be seen as an artifact of the attempt to use
         things that should change very rarely (schemas) to
         publish things that people often want to modify (code
         lists).  One result of this has been that instead of
         recommending a standard for code lists using a standard
         formalism (such as an XSD schema) we have been
         recommending a template for code list schemas for which
         there is no standard formalism, just a complex set of
         prose descriptions supplemented by examples.  The code
         list paper published in UBL 1.0 admits it to be
         "desirable that the [code list] data model be expressed
         in a machine readable form" but can do no more than to
         place this desirable development in some distant future
         where a formalism exists for doing so.  The definition of
         a standard XML schema for code lists would solve this
         simply by putting such a definition at the appropriate
         conceptual level.

       - Defining codes as unrestricted strings would obviously
         make it trivially easy to meet all the requirements for
         ad hoc code list modification.  The tradeoff would be
         that the code lists themselves could no longer be used to
         directly drive XSD validation.  It is unlikely, however,
         that any major user of the UBL schemas would be satisfied
         with just a simple check against an enumeration before
         entering the document into an accounting application; it
         is much more likely that something like a Schematron
         check would be performed following simple XSD validation.
         This is in fact what is done in the Danish
         implementations, and it more closely reflects an initial
         premise of the UBL code list effort that most code list
         validation would take place at the application level
         (report of the NDRSC, 18 March 2002).

       - Post-schema validation appears to be less problematic
         than what we're hearing from initiatives that are
         attempting to use substitution groups.  We believe it to
         be significant that this is the approach adopted for ISO
         20022 (banking).

       - We could provide a mechanism (an XSLT transformation, for
         example) that would take *any* code list published using
         the standard code list schemas and generate code list
         schema modules just like the ones we've included in UBL
         1.0.  (The XSLT would, in effect, provide the missing
         formalism needed to specify the construction of the
         schema modules in a machine-readable way.)  In fact, we
         could provide the modules so generated as part of the
         release package together with instructions for validating
         instances against these generated modules in a second XSD
         pass, thus providing all of the advantages of validation
         against enumerations while still allowing easy
         modification of code lists.

       - In a separate decision, the TC decided this morning to
         accept the UDT and CCT schema modules defined by
         UN/CEFACT ATG2 rather than defining and maintaining our
         own.  Those schema modules reference a few standard code
         lists (currencies, language codes, units of measure, mime
         media types) that would retain the old enumeration form.
         As most real-world situations requiring code list
         modification are encountered not with these very basic
         standard lists but with industry-defined code lists, this
         is not considered a problem.

       - Mark Crawford wished to be put on record as having
         reservations about this approach for two reasons:

         1. The desirability of maintaining XSD validation of
            codes, and

         2. The wish to maintain alignment with ATG2, which
            intends to specify all code lists as schema modules.
            It is recognized, however, that ATG2 does not have
            customization as a goal, whereas we do (though to what
            extent still remains to be determined).

       - To make this approach practical for users, it will be
         necessary to provide documentation showing users how to
         implement a post-schema code list validation phase using
         Schematron.  Bryan Rasmussen has volunteered to create
         this if his management will approve the work.

Everyone interested in this subject should be prepared to
participate in tomorrow's follow-up discussion (1 p.m. Ottawa time
Thursday 11 August at the usual UBL conference number).

Jon
Follow-Ups:
- Re: [ubl] Code list discussion so far
  - From: jon.bosak@sun.com