OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

bdxr message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]

Subject: Re: [bdxr] Agenda for BDXR TC meeting 27 April 2016

For discussion tomorrow I have this observation to make regarding the anticipated comments.

At 2016-04-26 17:00 +0000, Kenneth Bengtsson wrote:
Some anticipated comments for the SMP 1.0 CSPRD 02 (yet to be published):

- Section 3.3, why must the encoding of XML content be treated as case sensitive?
- Section 2.4.6, case sensitivity of Document Identifiers;
- Section 2.4.7, case sensitivity of Participant Identifiers.

I think the precedent is set in the XML specification where IDREF, IDREFS, ENTITY and ENTITIES attribute content values are case-sensitive. The values are name tokens and the name tokens must be matched exactly.

In SGML days it was confusing because it was up to the SGML Declaration (implicit or explicit) to dictate whether name values were or were not case sensitive. On the XML committee it was discussed that this was a level of complexity that took one away from keeping XML simple. Processing and simple assumptions are just easier in XML if one never expects case-insensitive values. It may put a burden on the creator of XML, but the burden of the implementer and the receiver of XML was considered more important than that of the creator in this regard. And in many ways it is just easier for everyone if one never considers case sensitivity and the data is prima-facie what it is without having to interpret it.

We wouldn't want one user to say "hey, I thought this was case sensitive" while the trading partner disputes it saying "no, I meant it to be case insensitive this time around". If we stick to case sensitivity in lock-step with XML then the same principle is being applied to both the markup and the data.

I realize this is contrary to sections 2.4.6 and 2.4.7, but I hadn't considered the question until posed in the agenda so I hadn't thought about it. An identifier resolution service can choose to be case insensitive and a user's case-sensitive value will work. But it doesn't go the other way around: if an identifier resolution service (which may already exist) is case sensitive and the user thinks the identifier is case insensitive, incorrect values won't be resolved.

Thinking about it further, for consideration to add to the SMP, the DOCTYPE external identifier, that is explicit in SGML using "PUBLIC" and is implicit in XML as adjacent to the SYSTEM identifier, matches with case sensitivity but also has a normalization process:

  "Before a match is attempted, all strings of white space in the
   public identifier MUST be normalized to single space characters
   (#x20), and leading and trailing white space MUST be removed."

But, then, going back to the identifier resolution service, such may deal with space normalization differently. So it might be risky to presume the normalization.

Perhaps we could rely solely on a principle that specifying information shall be constrained but interpreting information may be relaxed. We accept all information as prima-facie what was meant by the author of the values, without normalization and without case sensitivity, and it is up to the services accepting the authored values to assess those values as being conformant, equivalent or even acceptable. We take a hands-off approach stating that zero processing is attempted or implied on any content values from SMP's perspective.

As for 3.3, I'm going to be contrary again because the last sentence in 3.3 states the pseudo-attribute is case sensitive but the XML specification states that it SHOULD (their emphasis) be case insensitive:

  "XML processors SHOULD match character encoding names in a
  case-insensitive way"

This happens to be the only value in XML that is case-insensitive. So "UTF-8" is equivalent to "utf-8" in the XML declaration encoding pseudo attribute:

  <?xml version="1.0" encoding="utf-8"?>

... but the XML declaration is a directive to the XML processor and not a directive to the user, so it is an outlier from other considerations related to information exchange that are case sensitive.

These are interesting discussion points.

Chat with everyone tomorrow!

. . . . . . . Ken

Check our site for free XML, XSLT, XSL-FO and UBL developer resources |
Streaming hands-on XSLT/XPath 2 training @US$45: http://goo.gl/Dd9qBK |
Crane Softwrights Ltd. _ _ _ _ _ _ http://www.CraneSoftwrights.com/o/ |
G Ken Holman _ _ _ _ _ _ _ _ _ _ mailto:gkholman@CraneSoftwrights.com |
Google+ blog _ _ _ _ _ http://plus.google.com/+GKenHolman-Crane/posts |
Legal business disclaimers: _ _ http://www.CraneSoftwrights.com/legal |

This email has been checked for viruses by Avast antivirus software.

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]