tag message

Subject: Re: [tag] Proposal: Providing Decentralized Extensiblity of Enumerative-Attribute Values

From: Stephen Green <stephen.green@documentengineeringservices.com>
To: dennis.hamilton@acm.org
Date: Sat, 14 Nov 2009 18:40:09 +0000

I think we have another problem with using either
QName or anyURI in that both seem, from what
I've discovered so far, to be mutually incompatible.
However, normalizedString is compatible with both.
Both are subsets of normalizedString, I think, but
neither is a subset of the other as far as I know. I
need to check up on this though. If this is true, it
just seems most sensible to me to use the
superset of both, normalizedString, so neither is
precluded in any TAML profile.

To me, by the way, restricting the datatypes is a
bit like tightening the bolts on a car wheel - you
want them tight enough that the wheel doesn't come
off but not so tight that they can't be removed later.
I tend to start with loose datatypes then tighten them
but only to the point they are tight enough and not
further. In most cases this means first trying a string
then normalized string then sticking with that if no
more tightening makes good sense in all use cases.
If all use cases are OK with a tighter datatype, fine,
but then you have to choose between token, anyURI,
QName, etc, checking that the datatype works for
all known and anticipated use cases. Even then the
use cases might not be known in advance so going
beyond normalizedString seems risky and I tend to
try to be a little risk-averse when we don't foresee
all the use cases in advance.

Best regards
---
Stephen D Green




2009/11/14 Dennis E. Hamilton <dennis.hamilton@acm.org>:
> I'm concerned we are confusing syntactic well-formedness with semantic
> validity.  XML Schema is entirely syntactic, although there is some
> semantic-ness around the presumed sense of data types.  There is no
> practical way beyond trivial cases to ensure that syntactic-wellformedness
> is sufficient for semantic validity.
>
> QName is much more restrictive and specific than normalized attribute
> values.  And the XML Schema support for the QName datatype has the
> exactly-correct semantics.  I don't understand how this allows "almost any
> arbitrary string" and at the same time there is concern that it constrains
> the custom values.  QNames restrains the values to being NCNames for local
> names of a namespace, but anybody can originate an unambiguous, unique
> namespace.  And that namespace definition can provide a mapping to any
> arbitrary external code list as part of its definition.
>
> We should stand back and ask ourselves the important architectural question:
> Do we intend to permit custom extension of enumerative value sets.  If we
> don't, we should require that only the values we define be syntactically
> acceptable for a given set.  (I think we should also obligate ourselves to
> use NCName as the XML Schema base type to keep ourselves out of trouble.)
>
> If we do intend to permit custom extensions, it behooves us to ensure that
> we have, from the beginning, a provision that allows unambiguous, unique
> introduction of custom values using decentralized authority.  We are in a
> position to require that these be implementation-defined for conformant use
> (i.e., there must be public documentation of the namespace used and the
> values that are introduced for the particular attribute case).
>
> The clean choices seem to be either no permissible extensions (because there
> is no safe mechanism provided and we want to retain the possibility of
> introducing one later) or handling the extension mechanism now.  I'm for
> now, so that the community can evolve additional applications and
> enumerative values without having to wait for revisions to a TAG
> specification.  It also provides benign ways to add to the standardized set
> in the future.  This is the best opportunity we will ever have for putting a
> stake in the ground here.
>
> I'd be satisfied in either choice, although my preference is to address
> extensibility in TAML 1.0.  I can't imagine that extensions won't be made
> and/or asked for especially if there is diverse adoption of the TA Model.
>
> I'm fairly confident that we can't use TAG for OIC work without the prospect
> of custom extensions for the peculiar demands around testing
> document-processing applications for how they honor a standardized format in
> interoperable ways.  So we'd have to use the model but not the TAML.  (We
> also use Relax NG and other schema models including, heaven forefend, OWL.)
>
> I also envision applications of my own, around implementation
> specifications, that would benefit from the TAG model.  It would be nice to
> defer everything to the TA Model and TAML, but I am not constrained to that.
>
>  - Dennis
>
> PS: I got a 502 the first time I accessed the developerWorks URL, but I got
> through later.  I don't think there is a contradiction here, although think
> Kiel means to be addressing this case in the context of developing or using
> a standard-defined schema.
>
> Note that the union case fails if more than one organization independently
> introduces and uses the "x:" prefix and schema-union technique.  The use of
> namespaces for disambiguation (the whole point in XML) is an
> already-recognized practice.  QName is simply a generalized version of
> Kiel's example to deal with decentralized customization of enumerations with
> namespaces as the disambiguating authority.
>
> If we have the requirement that XML Schema validation must be sufficient,
> then I think we should not allow anything but restriction to our predefined
> terms.  In that case, I would use NCName as the base type.  Of course, QName
> is a built-in XML Schema datatype too.
>
> -----Original Message-----
> From: stephengreenubl@gmail.com [mailto:stephengreenubl@gmail.com] On Behalf
> Of Stephen Green
> http://lists.oasis-open.org/archives/tag/200911/msg00029.html
> Sent: Friday, November 13, 2009 12:40
> To: dennis.hamilton@acm.org
> Cc: TAG TC List
> Subject: Re: [tag] Proposal: Providing Decentralized Extensiblity of
> Enumerative-Attribute Values
>
> There are many principles relevant here. Having spent many
> years closely monitoring the UBL TC and Codelist Representation
> TC deliberations on this and discussing the same within UBL TC
> I have found that the kinds of conclusions in papers such as this
> one
> http://www.ibm.com/developerworks/xml/library/x-extenum/index.html
> have much merit. The problem is one of how to apply the architecture
> to which we have already subscribed by electing to use XML Schema.
> The schema has to be able to discern certain things, especially it
> really MUST be able to validate the existing, built-in codes. Do we
> really expect to be extending these codes? Even if we do, we need to
> ensure the schema knows the difference between a custom code and
> a mistake. QName alone would allow virtually any string to be valid.
> I'm not sure that price is one worth paying. Plus QName would perhaps
> restrict the custom code values: That might not such be issue unless
> such codes are outside of the control of the customizer, as with an
> externally defined codelist whose code values might not all be valid as
> QNames.
> [ ... ]
>
>

Follow-Ups:
- RE: [tag] Proposal: Providing Decentralized Extensiblity of Enumerative-Attribute Values
  - From: "Dennis E. Hamilton" <dennis.hamilton@acm.org>

References:
- Proposal: Providing Decentralized Extensiblity of Enumerative-Attribute Values
  - From: "Dennis E. Hamilton" <dennis.hamilton@acm.org>
- Re: [tag] Proposal: Providing Decentralized Extensiblity of Enumerative-Attribute Values
  - From: Stephen Green <stephen.green@documentengineeringservices.com>
- Re: [tag] Proposal: Providing Decentralized Extensiblity of Enumerative-Attribute Values
  - From: Stephen Green <stephen.green@documentengineeringservices.com>
- RE: [tag] Proposal: Providing Decentralized Extensiblity of Enumerative-Attribute Values
  - From: "Dennis E. Hamilton" <dennis.hamilton@acm.org>
- Re: [tag] Proposal: Providing Decentralized Extensiblity of Enumerative-Attribute Values
  - From: Stephen Green <stephen.green@documentengineeringservices.com>
- RE: [tag] Proposal: Providing Decentralized Extensiblity of Enumerative-Attribute Values
  - From: "Dennis E. Hamilton" <dennis.hamilton@acm.org>