bdxr message

Subject: Re: [bdxr] Agenda for BDXR TC meeting 27 April 2016

From: "G. Ken Holman" <gkholman@CraneSoftwrights.com>
To: Kenneth Bengtsson <kbengtsson@efact.pe>, "bdxr@lists.oasis-open.org" <bdxr@lists.oasis-open.org>
Date: Tue, 26 Apr 2016 14:54:53 -0400

For discussion tomorrow I have this observation to make regarding theanticipated comments.


At 2016-04-26 17:00 +0000, Kenneth Bengtsson wrote:

Some anticipated comments for the SMP 1.0 CSPRD 02 (yet to be published):

- Section 3.3, why must the encoding of XML content be treated ascase sensitive?

- Section 2.4.6, case sensitivity of Document Identifiers;
- Section 2.4.7, case sensitivity of Participant Identifiers.

I think the precedent is set in the XML specification where IDREF,IDREFS, ENTITY and ENTITIES attribute content values arecase-sensitive. The values are name tokens and the name tokens mustbe matched exactly.

In SGML days it was confusing because it was up to the SGMLDeclaration (implicit or explicit) to dictate whether name valueswere or were not case sensitive. On the XML committee it wasdiscussed that this was a level of complexity that took one away fromkeeping XML simple. Processing and simple assumptions are justeasier in XML if one never expects case-insensitive values. It mayput a burden on the creator of XML, but the burden of the implementerand the receiver of XML was considered more important than that ofthe creator in this regard. And in many ways it is just easier foreveryone if one never considers case sensitivity and the data isprima-facie what it is without having to interpret it.

We wouldn't want one user to say "hey, I thought this was casesensitive" while the trading partner disputes it saying "no, I meantit to be case insensitive this time around". If we stick to casesensitivity in lock-step with XML then the same principle is beingapplied to both the markup and the data.

I realize this is contrary to sections 2.4.6 and 2.4.7, but I hadn'tconsidered the question until posed in the agenda so I hadn't thoughtabout it. An identifier resolution service can choose to be caseinsensitive and a user's case-sensitive value will work. But itdoesn't go the other way around: if an identifier resolution service(which may already exist) is case sensitive and the user thinks theidentifier is case insensitive, incorrect values won't be resolved.

Thinking about it further, for consideration to add to the SMP, theDOCTYPE external identifier, that is explicit in SGML using "PUBLIC"and is implicit in XML as adjacent to the SYSTEM identifier, matcheswith case sensitivity but also has a normalization process:


  https://www.w3.org/TR/2008/REC-xml-20081126/#sec-external-ent
  "Before a match is attempted, all strings of white space in the
   public identifier MUST be normalized to single space characters
   (#x20), and leading and trailing white space MUST be removed."

But, then, going back to the identifier resolution service, such maydeal with space normalization differently. So it might be risky topresume the normalization.

Perhaps we could rely solely on a principle that specifyinginformation shall be constrained but interpreting information may berelaxed. We accept all information as prima-facie what was meant bythe author of the values, without normalization and without casesensitivity, and it is up to the services accepting the authoredvalues to assess those values as being conformant, equivalent or evenacceptable. We take a hands-off approach stating that zeroprocessing is attempted or implied on any content values from SMP'sperspective.

As for 3.3, I'm going to be contrary again because the last sentencein 3.3 states the pseudo-attribute is case sensitive but the XMLspecification states that it SHOULD (their emphasis) be case insensitive:


  https://www.w3.org/TR/2008/REC-xml-20081126/#charencoding
  "XML processors SHOULD match character encoding names in a
  case-insensitive way"

This happens to be the only value in XML that iscase-insensitive. So "UTF-8" is equivalent to "utf-8" in the XMLdeclaration encoding pseudo attribute:


  <?xml version="1.0" encoding="utf-8"?>

... but the XML declaration is a directive to the XML processor andnot a directive to the user, so it is an outlier from otherconsiderations related to information exchange that are case sensitive.


These are interesting discussion points.

Chat with everyone tomorrow!

. . . . . . . Ken


--
Check our site for free XML, XSLT, XSL-FO and UBL developer resources |
Streaming hands-on XSLT/XPath 2 training @US$45: http://goo.gl/Dd9qBK |
Crane Softwrights Ltd. _ _ _ _ _ _ http://www.CraneSoftwrights.com/o/ |
G Ken Holman _ _ _ _ _ _ _ _ _ _ mailto:gkholman@CraneSoftwrights.com |
Google+ blog _ _ _ _ _ http://plus.google.com/+GKenHolman-Crane/posts |
Legal business disclaimers: _ _ http://www.CraneSoftwrights.com/legal |


---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

References:
- Agenda for BDXR TC meeting 27 April 2016
  - From: Kenneth Bengtsson <kbengtsson@efact.pe>