NDR Questions / Suggestions

FYI: I am performing this review as a part of an effort to reach out to the ISO TC204 intelligent transportation community. If you are interested in seeing my full report to this group, it is available at http://trevilon.com/download/Example.zip. Please be aware that this is a formal submittal (discussion paper) to ISO TC204 WG1 and some of its content could become part of a ISO standard; therefore it should not be considered a contribution to UBL at the present time, just an paper of interest by another group.

1. Why do we define an actual Core Component Parameters Schema when it is just an empty file? Couldn't we just reference the namespace without the file existing? If we define the file, shouldn't we define the elements used?

KH> The file exists to demonstrate that the namespace exists, but there is no point in providing a formal schema definition of elements since <appinfo> cannot be constrained by a validator. I thought the file contained some sort of explanation in comments, but it doesn't.

I would suggest that, as a minimum, we provide an explanation as to why the file exists within the comments in the file. I would like to see some sort of definition (formal or just in comments) of the elements that this namespace supposedly defines. For example, the UBL-CommonAggregateComponents-2.1.xsd file includes documentation fields such as "<ccts:ComponentType>"; therefore, this file should contain a definition of ComponentType. I accept that an automated processor will not validate the <annotation> field, but a human reader of the file should still be able to follow the logic; otherwise, why bother prepending the "ccts" namespace? However, since the definitions would primarily be provided for the human reader, these would not necessarily have to be defined as formal schema.

2. CTN2: This NDR rule states: "A UBL xsd:complexType name based on a CCTS BBIE Property MUST be the CCTS Dictionary

Entry Name shared Property Term and its qualifiers and the Representation Term of the BBIE with the separators removed and with the "Type" suffix appended after the Representation Term." I do not understand the phrase "the CCTS Dictionary

Entry Name shared Property Term." Shouldn't this read "the CCTS Dictionary Entry Property Term"? Is there some special meaning behind "Name shared" that I am missing?

KH> It might just be a grammar thing ... there is the "Dictionary Entry Name" (DEN) and there is the "Property Term" and the property term is inside of the DEN, so in that case it might be shared.

I'll concede that the wording might be technically correct and that I might be aware of some nuanced meaning, but if people who are reasonably aware of standards have difficulty in understanding the statement - even after studying the sentence, it seems to me that we should try to clarify the meaning.

3. ELN3: BBIE names include a special rule (CTN8) that removes duplicate terms between a Property Terms and the Representation Terms. Shouldn't ASBIEs have a similar rule that removes duplicate terms from Property Terms and the associated Object Class Terms. For example, if I have an association with the Property Term "Target Area" that links to an Object Class called "Area Location", shouldn't the ASBIE be called "TargetAreaLocation" rather than "TargetAreaAreaLocation"?

KH> No, because the elimination of duplicates happens on entire terms and not words found in the terms. I believe there are a couple of examples where there are consecutive duplicated words because they are there as portions of adjacent longer terms

I glanced through the UBL 2.1 CAC file and did not find any duplicates. However, I did find entries such as AdditionalQualifyingParty which was an ASBIE. It's definition indicates:

<ccts:PropertyTermQualifier>Additional</ccts:PropertyTermQualifier>

<ccts:PropertyTerm>Qualifying Party</ccts:PropertyTerm>

<ccts:AssociatedObjectClass>Qualifying Party</ccts:AssociatedObjectClass>

Thus, it would appear that such a rule is being applied, otherwise, as I understand ELN3, the element name would be AdditionalQualifyingPartyQualifyingParty. I just can't find the rule written down in the NDR.

4. How does the UBL modeling approach deal with specializations coupled with an association to the generalized class. For example, see http://trevilon.com/download/IncidentListRequest.png. A request for a list of incidents includes the targetArea of the request. The area is defined to be an abstract AreaLocation that can be implemented with either a CircleAreaLocation or a RectangleAreaLocation. It would seem as if the appropriate XML construct would either be to use substitution groups (coupled with another layer of abstraction if the class is used in a named association) or perhaps a choice; however, neither of these approaches appear to be allowed by the UBL NDR.

KH> Because there are no "OR" groups (choice models) in UBL, a specialization is typically modeled by a sequence of optional values, and the user chooses one of the optional values as the specialization they need. If they choose more than one, then the business rules would dictate which has priority. Each option (specialization) would include the generalized class as an ASBIE.

I guess I am seeing the following pros and cons

	substitution group	choice	optional elements
Allows use of appropriate type (i.e., use Circle or Rectangle)	Y	Y	Y
Allows restriction to a single type (i.e., use Circle or Rectangle, but not both)	Y	Y	N
Allows use of multiple types (i.e., inlcude both Circle and Rectangle in single instance document)	Y (multiplicity)	Y (multiplicity)	Y
Requires use of types from original version (i.e., if a future revision of the standard adds a Polygon specialization, could the schema be updated in a way to ensure forwards compatibility by requiring a Circle or Rectagle presentation in addition to an optional Polygon representation - )	N (Could not define Polygon to have AreaLocation as a substitution group, which would potentially break desired structure in other locations of schema set)	Y (Polygon would be added as an optional element outside of the original choice; but a new document could include Polygon as a part of its original choice structure)	N

Thus, it seems to me that the best approach would be to define a choice statement. While I will concede that this is not a strong enough argument to justify changing existing structures, is there any reason that this approach should not be adopted by new standards efforts?

5. Why do we include complexType definitions in the Common Basic Components? In other words, why don't we just declare the elements to be of the base UDT type?

Is this just a historic artifact from the old way of range checking (i.e., when that information was stored in the schema)? Based on my current understanding and a quick skim through the UBL 2.1 CBC file, it seems as if the CBC file includes complexType definitions for each BBIE that simply maps the complexType to the unqualified data types without any customization. Thus, there seems to be little benefit of defining these and a significant drawback (i.e., you have to define one more level of abstraction, which means one more level of understanding for anyone studying the model). Am I missing some benefit?

KH> I don't perceive the additional complexity or drawback with the added layer of redirection. In fact I think the level of abstraction is important in distinguishing the role played by each declaration.

I do not see any benefit of saying that IdentificationID is of an IdentificationIDType so that we can then say that IdentificationIDType is an extension of udt:IdentifierType without any added information. Why not simply say IdentificationID is of an udt:IdentifierType and decrease the size of the CBC file by 80% or more? I am not against the bulk if it serves a purpose, but I believe added levels of abstraction and additional bulk of a standard creates a perception that the standard is complex and therefore becomes a hinderance in getting people to adopt the standard. Perhaps I am missing a hidden benefit, but I have not been able to identify one.

6. What is the logic used to unambiguously handle interpretations of codes by independent communities. For example, the standard may define a certain code list (e.g., PackingTypeCodeList). This is useful because it helps the international community to standardize the list internationally. But then Community A decides to add to this list and Community B decides to add to this list; they each establish the code ZZA to represent their new packing type, but Community A's ZZA packing type is different than the Community B definition. It seems to me that the best approach would be to require all instances using a code list other than the standard code list to use the listName (or similar) attribute to specify exactly which list is being used, but I do not see any mechanism to require this. Am I correct in saying that a document containing the ZZA code without any list identifier would pass both validation steps without any warnings being issued? If so, doesn't this present a potential interoperability problem?

7. It is unclear to me how you expect extensions to be handled. I noticed that the Extension schemas that you provided had the "any" statement defined in the Common Signature Components file. While this seems to work for the standardized schema, the only way I could figure out how to extend this further was to add my extensions to the Signature file, which seemed counter-intuitive.

After some studying on this, I designed my example so that the "any" statement is contained in the ExtensionContentDataType file. I also changed this to "lax" processing rather than "skip" processing. Thus, to add extensions, the user just imports the relevant namespaces (and optionally assigns them xmlns abbreviations) from the customization schemas. With "lax" processing, the XML validator will ignore any unrecognized extension while checking any known extension.

Am I misunderstanding something, or is this a change that we should consider for UBL 2.1 PRD2?

KH>The limitation of W3C Schema allows only *one* extension to be validated at a time. If you have your own extension, you duplicate the XSD directory, replace the ExtensionContent module to point to a *different* extension ... but only one at a time. While it is an excellent guideline not to abuse the known declarations, it is overkill to force them to be correct. If the user is doing any validation, they will end up having to be correct just to use the schema fragments. Therefore, nothing is gained by adding anything more stringent than "any".

I do not understand KH's response. Wouldn't we force them to be correct if we replicated the XSD directory and added our extension there? Wouldn't that require yet another round of validation (actually two more to do both structure and range validation). I don't understand what advantage this would have over the "lax" method.

Regards,

Ken Vaughn

Trevilon Corporation

12906 Pinecrest Rd

Herndon, VA 20171

+1-703-390-1053

+1-571-331-5670 cell

ubl message