OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

ubl message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]

Subject: Re: [ubl] Differences between xsd:token and xsd:normalizedString

Just to remind of the original reason for allowing multiple spaces. When this was
in discussion, in UBL 1.0, I was at the time involved in finance application software
and was very much aware that the position of characters in a string in accounting
codes (such as cost codes, etc) was sometimes very important and the positioning
was typically by the use of the number of (multiple) spaces between the characters.
I don't know whether the same applies these days but I guess the lagacy is still
there in the accounting systems and data (mainly a mainframe thing, I think).
Best regards
Stephen D Green
Document Engineering Services Ltd

2009/6/15 G. Ken Holman <gkholman@cranesoftwrights.com>
Hi all,

In preparation for a technical discussion in tonight's Pacific call, I have some citations here regarding W3C Schema type definitions:

 - a string can have any set of valid XML characters

 - a normalized string cannot have carriage returns, line feeds or tabs
 - a normalized string can have any number of space characters, including
  contiguous sequences of space characters

 - a token cannot have carriage returns, line feeds or tabs
 - a token can have any number of singleton space characters, but not
  any contiguous sequences of more than one space character

So ... I wondered if "token" should really have been called "tokens" because the semantics of a token value could be seen as the set of singleton-space-separated tokens in a string:  the string has been tokenized (reduced to tokens).  All along I've been trusting the name to infer that it was a single token when in fact it can contain more than one token.  But, then again, it is confusing in the W3C Schema spec, because at the start it claims "token represents tokenized strings" while it also claims explicitly that the value space of token contains singleton spaces.  Which is correct?  There is a mail list where I can ask this, so I did last night and I got a brief response this morning:


Semantically, I think we are still where we want to be with UBL because even though most identifiers with spaces will have only one space, the entire value is the identifier.  Same with codes that users might decide will have spaces in them (who are we to restrict existing business practices?).  The value space of our values is not a set of space-separated tokens but a singleton value that has multiple spaces.  And we don't know that our users won't have sequences of spaces.  But we are asking our users not to use carriage returns, line feeds or tabs.  Which seems reasonable to me.

Given the answer I got this morning, it seems to me that indeed "token" really is, semantically, "tokens" ... that is a collection of token non-white-space values expressed in a space-separated string of tokens.  Certainly when our users are expressing a singleton code or identifier value containing spaces this is just a normalized string and not a tokenized string according to the published W3C definitions cited above; it isn't a set of space-separated values even if the expression of that set happens to be the right sequence of characters.

So for the discussion tonight, the choice in UBL 2.0 to use xsd:normalizedString instead of xsd:token appears to me to have been the right choice because of the implicit cardinality of syntactic values implied by the W3C definitions:  xsd:normalizedString is a singleton whereas xsd:token with embedded spaces is not.

. . . . . . . . . . . . Ken

XSLT/XQuery/XSL-FO hands-on training - Los Angeles, USA 2009-06-08
Crane Softwrights Ltd.          http://www.CraneSoftwrights.com/o/
Training tools: Comprehensive interactive XSLT/XPath 1.0/2.0 video
Video lesson:    http://www.youtube.com/watch?v=PrNjJCh7Ppg&fmt=18
Video overview:  http://www.youtube.com/watch?v=VTiodiij6gE&fmt=18
G. Ken Holman                 mailto:gkholman@CraneSoftwrights.com
Male Cancer Awareness Nov'07  http://www.CraneSoftwrights.com/o/bc
Legal business disclaimers:  http://www.CraneSoftwrights.com/legal

To unsubscribe from this mail list, you must leave the OASIS TC that
generates this mail.  Follow this link to all your TCs in OASIS at:

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]