ubl-cmsc message

Subject: [ubl-cmsc] Specialization Architecture (was FW: Another item for issuetrack ing list.)

From: "Burcham, Bill" <Bill_Burcham@stercomm.com>
To:
Date: Tue, 16 Apr 2002 11:16:10 -0500

Eduardo wrote:

> Bill,
>
> at the risk of appearing to support a different model, or of appearing to
> be attacking Paella (which at this point I'm not ready to do, one way or
> another), you forgot to include possibility (d) in your message, namely
> the possibility that while the proposal may satisfy (a), (b) and (c) entirely,
> it has undesirable side effects (which is what I read in Mette's
> undeconstructed critique)
>
> Eduardo

Good point Eduardo. To the extent possible , I hope we can cover (d) by improving the use-cases (a) or their realization (b). If a side-effect is important I think we should make an attempt to express it in a use-case (and XSLT/XSD etc.) As for attacking the proposal, I'd say the more attacks the better, provided we capture and reapply what we learn to all our proposals.

Having had some more time to think about the substance of Mette's original issue I'll take a crack at a deconstruction:

Mette wrote:
>     For example, you have a type foo, which has no content, and is
>     abstract. Then you declare fooOne, which extends foo with the
>     content (a, b, c). The you declare fooTwo, which extends foo with
>     the content (d|e|f). This means that you can encounter either one or
>     the other of two completely different content models. Since the base
>     type does not declare any content, you have just told your
>     application that you can not be bothered telling it beforehand what
>     content it should process, and that it will have to expect any
>     content, be it fooOne, or fooTwo, or some user defined extension
>     which it does not know about and may be written two years into the
>     future.

(In what follows I attempt to identify Requirements, Definitions and Design Decisions on which we have consensus)

This description above excludes the "implementation" types. The proposal includes two kinds of UBL types: abstract base types and derived (contentful) types. The trick is that the contentful types (the so-called "implementation" types) are expressed solely in terms of abstract base types. I'll make the drivers for this design decision explicit:

Requirement 1: It must be possible for a specialization of a UBL type to add an element between two elements.

(Requirement 2): It must be possible for a specialization of a UBL type to remove a required element.

We have demonstrated the solution to R1. We need to add a use-case for R2. Digging a little deeper, the notion of "required" element as discussed in NDRSC is a bit complicated. Expanding R2:

Requirement 2: There are three notions of a required element.

Definition 2.1 Instances valid w.r.t. a type containing a required element, must contain the required element. This is the straightforward notion of XSD validation: simply that an instance document is either valid or not with respect to the optionality specification of an element's declaration.

Definition 2.2 A derived type may not take away required elements. XSD derivation enforces a "replacability" rule: a derived type must be usable in place of its base type. This means that if a specialization is related to its base via XSD derivation then there is no direct way of eliminating in a derived type a required element of a base type.

Requirement 2.3 UBL specializations may need to take away required elements. We came up with a notion of specializations "taking away required elements". In this case, R2.1 is maintained, but the specialization (a brand new type) changes the optionality of the element such that document instances may be valid with respect to the specialization and yet the element may be missing.

Requirement 2.4 A specialization may specialize either base UBL or a specialization of base UBL. Any specialization architecture must be recursively applicable. The purpose of this transitivity rule is to explicitly make "in scope" the problem of extending an extension.

We've also got a design decision from NDRSC that should probably be stated as a requirement here:

Design Decision 0.1: UBL uses XSD as the normative form specification of constraints on conformant instance documents. It follows that XSD schema validation is the process by which documents are determined to be valid w.r.t. the UBL specification.

Design Decision 0.2: Specializations of the UBL specification use XSD to specify constraints on conformant instance documents (i.e. conformant w.r.t. the specialilzation). It follows that XSD schema validation is the process by which documents are determined to be valid w.r.t. a specialization of the UBL specification.

And one more from CMSC a couple weeks ago:

Design Decision 0.3: An XSD type is defined within a namespace. Each namespace has a unique name. XSD leaves open the possibility that two schema modules may specify different definitions for a particular namespace. For example:

in one schema module, "mine.xsd", namespace "foo" contains complex type "Address" having two elements in its content model; in another schema module, "yours.xsd", namespace "foo" contains complex type "Address" having only one element in its content model.

The UBL architecture prohibits this sort of namespace "aliasing". Namespaces are immutable. The UBL specification and conformant specializations never redefine namespaces. In the example above, schema module "yours.xsd" would have to define a new namespace for its new Address type.

See the minutes for NDRSC and CMSC for full discussion of the afforementioned design decisions.

The Paella proposal seeks to satisfy requirements 1 and 2 without running afoul of Design Decisions 0.1-0.3. It does this by breaking what would be monolithic (UBL) types into two pieces: interface and implementation. The two pieces together comprise the complete realization of the UBL "core component". But that could properly be viewed as so much philosophical b.s. -- my point in laying out assumptions here is to draw the target on the wall so to speak so we can agree on what we are shooting at.

Mette wrote:
>     This approach, along with any, anyAttribute and anyType, is just a
>     way of chickening out of adding any useful content. For the sake of
>     useful application processing, as much as possible should be in a
>     base content model, and extensions may be added later on, but then
>     the extension write will have to be aware that both sender and
>     reciever will have to be aware of the content, and that it is done
>     on a voluntary basis.

I'd like some examples of useful application processing that is not possible in the Paella model. I've shown 7 classes of useful application processing that are possible with the model. They are labeled (see xpaths.txt and use-cases.xsl):

Inheritance Selection -- select in a base or specialized instance document, content of an element defined in the base (and inherited in the specializations)
Extension Selection -- select in a specialized instance document, content of an element added to the base
Polymorphic Selection -- select in a base or specialized instance document, an element defined in the base and specialized in various ways.
Tunneling Reuse Selection (forgive me:) -- see xpaths.txt

Global Polymorphic Selection -- see xpaths.txt

Global Extension Selection -- ditto

Selection on Type -- ditto

I believe that Mette's a|b|c extended with d|e|f example is adequately captured by case 2: Extension Selection.

I believe Mette's comments regarding polymorphism echo a widespread aversion to the whole abstract base types approach (as exemplified in Paella) because on the surface it seems to leave the specification too open. We had some discussion of this in CMSC and it went something like this:

What about this candidate requirement:

*Candidate Requirement 3. It should be possible to enforce in a specialization that document instances conformant to the specialization use a particular specialization of a type T. For example, we'd like to enforce that specialization namespace "my-UBL" define a new address type and that document instances conforming to my-UBL must use the new Address definition -- never the one in "base" UBL.

There are three possible approaches to this:

the specialization defines a new schema module that imports the UBL namespace. In a new namespace, a new version of type T is defined.
the specialization defines a new namespace and copies all the UBL definitions into it (cut & paste!), but alters the definition for type T
the specialization uses the "redefine" import variant to accomplish the same thing as (2) but more succinctly

Approaches (2) and (3) violate Design Decision 0.3.

Approach (1) fails to meet Candidate Requirement 3 since a conformant instance document may contain either the T defined in the UBL namespace, or the T defined in the specialized namespace. XSD type substitutability cannot be subverted. Candidate Requirement 3 attempts to foreclose on XSD type substitutability.

So AFAIK there is no solution to Candidate Requirement 3 that also satisfies the other requirements and design decisions. As a result Paella does not satisfy candidate requirement 3. By far the more common case in practice is (the opposite of Candidate Requirement 3):

Requirement 4: Type substitutability: It must be possible (convenient) to define types derived from existing types. Derived types must be usable anywhere base types are usable. This is an outcome of the decision to use XSD.

It should be clear by now that I view the specialization architecture as independent of the context methodology. Not only is it valuable to be able to specify specializations in XSD -- it is actually an outcome of the decision to "use" XSD as our normative form specification language (Design Decisions 0.1, 0.2). The specialization architecture is a platform upon which context methodology may be applied.

Now there's a broad target. Fire at will!

Regards,

Bill