ubl-dev message

Subject: Re: [ubl-dev] Re: Code list extensibility and substitutiongroups
From: "Stephen Green" <stephen_green@bristol-city.gov.uk>
To: <<ubl-dev@lists.oasis-open.org>>
Date: Tue, 22 Feb 2005 11:09:00 +0000
Folks

Please forgive my writing only as somewhat of a novice in XSD
but I'd appreciate anyone showing me how the following reasoning
might be wrong.

As an aside, wouldn't it be a plausible argument against substitutionGroups
 that folk like myself, not expert in
XSD but likely to be responsible for implementations nonetheless, would
find substitutionGroups a little perplexing (even if just for failure to see their value in this
context and therefore reluctance to invest adequate time into adopting them).
I think my own Government's XSD guidelines are to avoid the more obscure aspects
of XSD in Schema architecture, perhaps for related reasons.

Anyway, here is my reasoning which makes me reluctant to accept substitutionGroups in UBL
for codelists (so please convince me of where I'm wrong, as I may well be wrong):

(Apologies for the length and complexity of this due to lack of time for editing)



1. we (UBL) currently offer two types of codelist, those with
    codes supplied and those without (the former having
    sub-categories too)

2. we offer codes in a way that allows validation using
    the XSD Schemas but only for some codes

3. if there were the need to offer an alternative for
    folks who wish to not use the supplied codes, it might be
    possible to offer something using xsd:choice so that
    it keeps the possibility of validation where required
    (e.g  xsd:choice....udt:AmountType...sdt:UBLAmountType...)

4. we are told that the case for substitution groups meeting a requirement
    is that 
a. they allow use of codes not in the supplied list 
b. they do not require ripple changes of all the various Schema modules
c. by implication it seems that they are supposed to offer the above in cases where
    validation is being allowed by use of XSD, since there would be no point in
    them where XSD validation is not employed (since here you just decide
    on a particular codelist and refer to its metadata in the code metadata attributes)

5. Now suppose one has substitution groups in the codelist Schemas in UBL 1.1:
as I read it (just studying the codelist papers and books like Wrox's 'XML Schema') 
this would mean the following

where a validatable codelist exists in UBL 1.0 and an instance can only have codes
of a certain known set of values, 
a. in UBL 1.1 the same codes would be still 'UBL-valid' if they had any values 
b. the metadata attributes could describe the values allowed
c. the XSD would allow any such values 
d. (here I get a bit out of my depth) what changes could be made to Schemas
to still validate the required range of code values,
e.  i.e. maybe certain values could be made 'XSD invalid' locally in specially 
adjusted Schemas (without namespace changes)
f. even if e. could be done, the values which are 'local XSD-invalid' would still be
'UBL-valid' since the substitution groups mechanism now allows any values theoretically
g. to my mind it is then, in this UBL 1.1, improper to call any particular code value in an instance
'invalid'
h. to my mind g. means that there is then less point including any values in the Schemas of this UBL 1.1
i.  g. means the codelist Schemas with values become much more like codelists without values (and hence
   without Schemas)
j.  i. means there is little point having the Schemas at all if they have values which are substitutable or 
    if they have no values at all
k. this leads to (almost?) the same solution we have in UBL 1.0 with the codes which do not have codelist Schemas
l.  this seems to be no longer in keeping with the implied requirement 4c. above
m. it would seem there is more overhead for implementers with such a UBL 1.1 in that all code values now
have to be expected

but my most important concern perhaps is:
n.  there seems no way with this UBL 1.1 to actually reliably say that a code value is *invalid* using XSD
o.   n. would appear to mean that either applications have to validate the codes without XSD or they
      would be better keeping with UBL 1.0 where at least they know what codes are invalid and which are not

Conclusion

6. So I don't see substitution groups offering anything other than a subtle difference to just
scrapping validatable codes. 
7. Furthermore they seem to make it impossible to ever validate
such codes using UBL XSDs. 
8. To prevent 7. we'd be likely to want to have some codes which don't
use substitution groups but which do have Schema valdiation and enumerated lists
9.   8. would probably involve the same codelists where we have validation in UBL 1.0
10.  9. would be no different then from UBL 1.0 regarding these codelists
11.  having a UBL 1.1 with Substitution groups would likely not actually include any Schemas
       with that mechanism in UBL 1.1 if the requirements are to be met anyway


If anyone has time and patience to follow all that and can reassure me that substitutionGroups do offer UBL 
codelists more than I give them credit for, please do.


All the best

Stephen Green



>>> "William J. Kammerer" <wkammerer@novannet.com> 22/02/05 02:47:13 >>>
I'm guessing that substitutionGroups mean "any extensions to the code
lists themselves cannot change in structure, only the enumerated sets
themselves can change," as the substituted element has either the same
type as the "head" abstract element - or one which can be derived from
it. I think that's the advantage of substitutionGroups over redefine;
there'd probably be nothing keeping you from changing the structure with
a redefine.

But in order for UBL to provide the (future) capability of "override,"
all the schemas for off-the-shelf code lists will probably have to be
modified to accommodate any possible future abstraction (kind of like
C++ virtual functions). I guess that's why the Code List group has to
make a decision now; and they won't know whether it's worth making these
changes unless someone can demonstrate how this substitutionGroup stuff
can be used.

William J. Kammerer
Novannet
Columbus, OH 43221-3859 . USA
+1 (614) 487-0320

----- Original Message ----- 
From: "Duane Nickull" <dnickull@adobe.com>
To: <jon.bosak@sun.com>
Cc: <ubl-dev@lists.oasis-open.org>
Sent: Monday, 21 February, 2005 07:36 PM
Subject: Re: [ubl-dev] Re: Code list extensibility and substitution
groups


Jon:

Apologies - several of us couldn't resist taking a shot at CAM.  You are
right and we should follow ocCAM's Razor - "one should not increase,
beyond what is necessary, the number of entities required to explain
anything".  Seems fitting, doesn't it ;-)
http://pespmc1.vub.ac.be/OCCAMRAZ.html 

The code list issue is a serious one and I do have one question about
determinism in this context.  Does this primarily refer to the fact that
any extensions to the code lists themselves cannot change in structure,
only the enumerated sets themselves can change?  Or does it imply a more
sinister pre-requisite knowledge of the entire enumerated set of values
AND the structure and both may be subject to substitution?

I do not see how you can both offer extensibility beyond that while
still preserving inter operability.  I think that looking at what
developers will have to do to access the code list values is important
in order to fully grok the complexity of the problem. My observation
would be to strictly define the logical data model and XML expression
for structure of code lists in order to allow deterministic statements
to be evaluated to retrieve code list values and marshal those into
objects during the parsing process.  For example,you could define an XML
structure that will always give you a List object containing all the
values for codes.  The java could be written like this:
// parsing the schema for enumerated values
    public InputStream[] getDataElementStreams() throws Exception {
        List codes =
this.currentElement.getChildren(CodeValueElement.SOME_FINAL_TOKEN_HERE);
        InputStream[] ret = new InputStream[codes.size()];
        for (int i = 0; i < codes.size(); i++) {
            try {
                ret[i] = new
DataCodeElementRef((Element)codes.get(i)).getInputStream();
            } catch (IOException e) {
                throw new AssemblyException("You wrecked UBL codes
forever....", e);
            }
        }
        return ret;
    }

This would allow a schema parser to interpret the entire substitute code
list as long as the structure rules were followed.  That is about as
deterministic as you can get IMO.

The GoC had some really compelling use cases for conditional validation
of code set values based on qualifiers.  The ability to support their
use case was not present in the current draft of W3C schema however some
issues were fixable by defining a better object model before expressing
it in XML (although I wouldn't want to start yet another elements vs.
attributes holy war).

I did see some cases where there is ambiguity in the UBL code list
specification.  For instance, what is the difference between a code list
identifier, code list name identifier, code list URI and a code list
name text?  The URI to me is a specialized instance of identifier  - I
ponder why more than two are needed.

If you allow changes to the structure, you are doomed.  No one can
effectively process XML if the structure itself is compromised from
instance to instance - that is why we developed DTD's, schemas etc. in
the first place, isn't it??

My $0.02 CAD worth (despite WIlliam thinking the currency is doomed)..

Duane
-- 
***********
Senior Standards Strategist - Adobe Systems, Inc. - http://www.adobe.com 
Vice Chair - UN/CEFACT Bureau Plenary - http://www.unece.org/cefact/ 
Adobe Enterprise Developer Resources  -
http://www.adobe.com/enterprise/developer/main.html 
***********
Follow-Ups:
- Re: [ubl-dev] Re: Code list extensibility and substitution groups
  - From: Chin Chee-Kai <cheekai@softml.net>
- Re: [ubl-dev] Re: Code list extensibility and substitutiongroups
  - From: "William J. Kammerer" <wkammerer@novannet.com>