ubl message

Subject: Processing instructions (was: [ubl] Minutes of Atlantic UBL TC call 5 April 2006)
From: "G. Ken Holman" <gkholman@CraneSoftwrights.com>
To: ubl@lists.oasis-open.org
Date: Mon, 10 Apr 2006 18:03:08 -0400
At 2006-04-10 13:33 -0700, jon.bosak@sun.com wrote:
>MINUTES OF ATLANTIC UBL TC MEETING
>15:00 - 17:00 UTC WEDNESDAY 5 APRIL 2006
>...
>    JB: Other NDR issues?
>
>    PB: Version is covered by ABIE instead of attribute, and that's
>    OK.  But we have a requirement from uk/se/dk for a place to
>    indicate which application generated an instance for debugging
>    purposes.  We were told in Ottawa to use a PI.  GKH says we
>    shouldn't formalize a PI into a standard, but this is an
>    instruction for processors.
>
>    JB: There doesn't seem to be another good way to do this.

I was unaware from the earlier discussion this was for debugging purposes:

At 2006-04-04 14:46 +0200, Peter Borresen wrote:
>To be able to track how the instance was generated the following 
>process-instruction SHOULD be added to each document instances:
>
><?InstanceInfo
>         Creator="<person or application (inklusiv full verison 
> attributes) that has generated the document"
>         Created="<date and time the document was send>"
>?>

The above example is attributing persistent information about the 
instance to the document, not transient processing information.  This 
example isn't (in my mind) a processing directive ... it is 
additional information about the instance.

Consider the standardized processing instruction for stylesheet association:

   http://www.w3.org/1999/06/REC-xml-stylesheet-19990629

A processing instruction is an annotation ... it isn't (shouldn't be) 
a source of additional information.  The information *in* the 
document does not change based on the presence of this standardized 
processing instruction.  Nor does the information change with the 
absence of one or the removal of one that may have been.  Nor can 
anyone associate the information in the PI with the information in 
the document ... the PI is solely for a processing application in the 
interpretation of the document.

The PI is a processing directive (hence the name: processing 
instruction).  The information in the example above isn't directing 
anything, nor instructing anything about the document.

Furthermore, the presence of and syntax used in processing 
instructions cannot be constrained by XML document modeling technologies.

Encoding the author and time stamp of the document in the PI feels 
too much like using an arbitrary and unvalidatable mechanism to add 
information items to the document.

However ... that's just my opinion, and if it is decided to include 
such information in UBL using processing instructions, I then have 
comments about the above processing instructions themselves as follows.

Note that a processing instruction does not have real attributes, 
regardless of the syntax used within the processing 
instruction.  There are only two pieces of information:  the name 
token at the start (called the PI target) and the rest of the string 
following the white-space that follows the target.  A downstream 
application is obliged to parse the information found in that 
unstructured string.  There is no validation that the correct quoting 
has been used in these pseudo-attributes, or to access the 
information, and a processing application has to take on the burden 
of parsing the string.

When I've designed processing instructions, I've tried to determine 
what information is standalone, and what information is tied 
together.  Analyzing the W3C standardized stylesheet association 
processing instruction, there are four pieces of mandatory 
information that are all tied together and related.  Since they are 
all tied together and related, they are all in a single processing 
instruction.  This burdens processing applications with parsing the 
PI string to find the four pieces of information, and using 
name/value pairs as attributes are is a meaningful way to do this 
association.

But the important issue is that one is not only not obliged to use 
pseudo-attribute syntax, I suggest that for singleton values it is 
inappropriate to use pseudo-attribute syntax.

Instead of:

<?InstanceInfo
         Creator="<person or application (inklusiv full verison 
attributes) that has generated the document"
         Created="<date and time the document was send>"
?>

I would rather suggest two separate processing instructions, either 
of which still has meaning if the other one is missing, and there is 
no burden on processing instructions to obtain the information out of 
the string value (no quotes, no parsing, just the PI value is the data value):

<?UBL-creator person-or-application-as-rest-of-string?>
<?UBL-created date-time-as-rest-of-string?>

Alternatively, I could live with the following where a single PI 
target identifies all information targeted for UBL processors and it 
is easy to extract and use the initial space-delimited name token in 
the processing instruction string to determine what the rest of the 
string represents.

<?UBL creator person-or-application-as-rest-of-string?>
<?UBL created date-time-as-rest-of-string?>

Lastly, though this isn't something that can be taken advantage of in 
XSLT, to be complete according to the XML recommendation, I believe 
any agreement on a processing instruction target name should include 
an agreement on an associated SYSTEM and possibly PUBLIC identifier 
for the NOTATION associated with the target.  The target is, 
according to the spec, documentary (like a namespace prefix), but it 
has been given weight in XML language API interfaces because I 
believe the interface designers missed this association between the 
PI target and the formal identifiers.

This is defined in XML section 4.7:

    http://www.w3.org/TR/2004/REC-xml-20040204/#Notations

So, in DTD speak we would then need something like:

   <!NOTATION UBL SYSTEM "urn:oasis:names:specification:ubl:processing">

In W3C Schema speak it would be:

   <xsd:notation name="UBL"
                 system="urn:oasis:names:specification:ubl:processing">

If we had separate PI targets for each, then it would be:

   <xsd:notation name="UBL-creator"
                 system="urn:oasis:names:specification:ubl:processing:creator">
   <xsd:notation name="UBL-created"
                 system="urn:oasis:names:specification:ubl:processing:created">

But seeing that just reinforces to me that these two pieces of 
information requested still don't feel like processing directives to 
me ... they still feel like information items ... and I don't think 
they belong in processing instructions.

XML says it all:  A processing instruction allows a document to 
contain instructions for applications.  A processing instruction 
target identifies the application to which the directive is 
addressed.  For stylesheet association, "xml-stylesheet" is an 
appropriate PI target.  The target "InstanceInfo", or even 
"UBL-creator" and "UBL-created", are not appropriate PI targets.

I hope this helps.

. . . . . . . . . . . . . Ken

--
Registration open for XSLT/XSL-FO training: Wash.,DC 2006-06-12/16
Also for XML/XSLT/XSL-FO training:Birmingham,England 2006-05-22/25
Also for XSLT/XSL-FO training:    Copenhagen,Denmark 2006-05-08/11
World-wide on-site corporate, govt. & user group XML/XSL training.
G. Ken Holman                 mailto:gkholman@CraneSoftwrights.com
Crane Softwrights Ltd.          http://www.CraneSoftwrights.com/o/
Box 266, Kars, Ontario CANADA K0A-2E0    +1(613)489-0999 (F:-0995)
Male Cancer Awareness Aug'05  http://www.CraneSoftwrights.com/o/bc
Legal business disclaimers:  http://www.CraneSoftwrights.com/legal
References:
- Minutes of Atlantic UBL TC call 5 April 2006
  - From: jon.bosak@sun.com