ubl-comment message

Subject: Re: [ubl-comment] Methodology Paper Comments

From: Tim McGrath <tmcgrath@portcomm.com.au>
To: "Miller, Robert (GXS)" <Robert.Miller@gxs.ge.com>
Date: Wed, 17 Jul 2002 10:28:51 +0800

Title: Methodology Paper Comments

Many thanks for this valuable input. Your objective assessment is very encouraging and enlightening to those of us who are tightly involved in this. It is good to think we are on the right track!

My specific comments are inline with your notes.

Miller, Robert (GXS) wrote:

6EE295F4F386AC48B4FF6EB0CDBABD7401867C7E@ROC04BXGEISGE.is.ge.com">
Comments from Bob Miller on:
Position Paper: Library Content Methodology

Gentle people,

Overall, I found this position paper to be well formulated. I have been encouraged by the relatively formal approach taken within this team. This paper draws upon respected and proven information design principles (Relational Theory, Model Normalization, Object Classes). In a few places, this document seems to overlook its own principles, most particularly when accepting without careful examination some parallel work. An example is in sections 2.3.12.1 Applying Context to UBL and 3.3.12 Context:

IMO, the discussion in these sections should have pointed out that Contexts are properties of a class. ShippingContact and BillingContact simply constrain a Context property. Instead, I read in 3.3.2 "In many vocabularies, context is suggested by the component's name." And that's also what I see in the example from UBL vocabulary, two BIE's whose class is Contact, but whose context is "suggested by the component's name." "Suggested" doesn't cut the mustard! In fact, ShippingContact and BillingContact should be represented by subclasses of Context, each of which constrains a Context property of the parent class.

Whilst the methods of applying context to data vocabularies is still evolving i think we are forced to take some of this work 'on faith', but i would agree we need to put more integrity into this area. I also believe you are correct that context is a property of a class. That is why it appears in the name (at least sometimes :-! ), when we apply the tripartite object+property+rep term naming convention. Presumably we could use a whole set of property qualifiers to denote all contexts. However, UBL as a group are still working on the idea that context will be defined as separate metadata - specific properties using the 8 contexts given by ebXML. This means that the name does not have to denote any context, the metadata will. It just so happens that we are required to put properties into the naming convention. As yet we are loosely applying both ideas and this may cause us duplication problems. For example to use your favourite example, ShippingContact may be defined with a business process context of 'Goods Delivery', BillingContact as something other than 'Payment'. I would be interested in yours or anyone else's preference or opionions on the best way to deal with this.

6EE295F4F386AC48B4FF6EB0CDBABD7401867C7E@ROC04BXGEISGE.is.ge.com">

Perhaps the root source of my concern with the example of "Contact" is really found in this position paper's discussion and table 2 of 'Type" in section 2.3.10. The discussion observes that "data types are just another form of entity/object class/aggregate BIE." But then, it fixes a 'basic type' at too high a level (see for guidance XSD basic and derived types and note the properties these types establish and inherit.) And it suggests that a couple of layers of refinement are sufficient.

In my analysis of X12 vocabulary, I have found that (most) individual code list values identify 'semantic primitives'. They effectively point at a set of defining metadata, and they have no associated instance value (have no value property). Such primitives may of course appear in multiple code lists. And these code lists in turn are typically associated with entities which do carry associated instance values (have a value property). Bottom line is, if the semantic entity appears in a code list, and that list is associated with an entity that does have a property value, than that semantic primitive is a property of the entity with which it is associated. In X12, there are some 'basic business data (type) elements' like amount. In usage, they are sometimes associated with a code list. At other times, they appear without such association, but are embellished in the segment definition by a 'semantic note' that 'fixes' the value of one or more properties of the basic business data (type) element 'amount'. From a semantic viewpoint, TotalDollarAmount and Amount context="TL currency=:"US" are identical.

Some codes define content (good) and others define meta-data (to be avoided).

For example, ISO 3166.1 defines a set of valid Country Codes (i.e. a content code). So, if I wanted to unambiguously specify that the country of destination was Australia I would use the ISO 3166.1 code ‘AU’.

However, EDIFACT 2005 (Date or time or period function code qualifier) defines the function of a date, i.e. it defines meta-data. These are the codes that tell you what kind of thing you are dealing with . So, if I wanted to use a generic date field within an Order, I could qualify it by accompanying the date with a code of ‘4’ meaning that the date is the ‘date when an order is issued’. This is an alternative to defining the order date object explicitly. The same principle applies to the many qualifier and function codes used throughout EDI messages. This concept goes to extremes with things like EDIFACT 1131 (Code List Identification code), defining meta-meta-data – an interoperability nightmare.

None of these meta-data codes represent semantic information entities, they provide clues to properties or perhaps contexts for Core Components. They are not Core Component codes as required by the CCTS. As you say, they attempt to convey semantic meaning via coded references rather than in the Library itself and are unnecessary and possibly dangerous in a well formed data vocabulary. It defers the responsibility for data interoperability to the individual implementation application.

I am not sure TotalDollarAmount and Amount context='TL currency=US' are semantically equivalent. Only for an American!

6EE295F4F386AC48B4FF6EB0CDBABD7401867C7E@ROC04BXGEISGE.is.ge.com">
In designing a business document at a syntax-neutral level, if there is a need to express a total dollar amount, it is advantageous to express that need as an amount with specific property constraints on context and currency. Then, a syntax specific schema generator can use this information along with a set of grammer rules to generate an appropriate schema for instance representation. For example, a generator would likely have a rule that property constraints exceeding some (target syntax) threshold minimum set of choices results in generator of a choice of entities, each of which has a fixed property value. A property constraint that exceeds the threshold results in generation of a set of entities that allow/require the property value to be explicit in the data instance.

i confess, i am not sure what you are saying here. can you give an example?

6EE295F4F386AC48B4FF6EB0CDBABD7401867C7E@ROC04BXGEISGE.is.ge.com">

In Section 2.3.14 Assembling Document Definitions I find perhaps a more serious conflict with the foundation this paper lays. But before getting into that, let me suggest that the term 'document' as used earlier in this paper likely is not the same as 'document' as used in this section. I think the one or the other usage is inappropriate. I vote document for section 2.3.14, and something else for 2.3.3

yes, i can see your concern. we are using document in the XML sense (3.3.14) and the more abstract set of components used in the exchange of data between applications (3.3.3). I guess we are fixed with the XML document, so do we have a suggestion for the second use? something like 'message', 'business exchange message', or somesuch???

6EE295F4F386AC48B4FF6EB0CDBABD7401867C7E@ROC04BXGEISGE.is.ge.com">

I take some issue with the statement "An hierarchical, top-down and nested tree structure is still the most practical way to define any document's structure." I firmly believe that the most practical way to define any document structure is to "define one or more hierarchical views of the data to be represented in the document from a relational definition of the data." I think in principle that is what you meant to say, but I assert it is not what you said. If you don't like my definition, change the word 'design' to 'represent' in your definition and you at least won't raise my eyebrows. There is of course the nasty 'HL hierarchical loop' issue to address. Perhaps that disappears in a wave of 'context' applied to the logical model. I hope so.

once again i agree that we used slack phrasing here. i accept you more accurate interpretation.

6EE295F4F386AC48B4FF6EB0CDBABD7401867C7E@ROC04BXGEISGE.is.ge.com">

IMO, the faults I find in this paper are neither in the foundation it lays, nor in the recommendation it makes. Its just a little of the detail I feel could use some cleanup.

Cheers,
Bob Miller

our intention is to collate all comments and republish the paper around july 30th. I look forward to your input on the revised document as well.

-- 
regards
tim mcgrath
fremantle  western australia 6160
phone: +618 93352228  fax: +618 93352142

References:
- [ubl-comment] Methodology Paper Comments
  - From: "Miller, Robert (GXS)" <Robert.Miller@gxs.ge.com>