wss message

Subject: RE: [wss] White space (was RE: [wss] Proposal for a new attribute of UsernameToken)

From: "Reid, Irving" <irving.reid@hp.com>
To: "wss" <wss@lists.oasis-open.org>
Date: Wed, 29 Oct 2003 11:28:08 -0500

> From: Eric Gravengaard [mailto:eric@reactivity.com]
> 
> I think I agree with Irving. But I think that we should be 
> clear about that in this spec then since we are being very 
> clear about how to use the nonce ("the octet sequence of its 
> decoded value").
> 
> I propose the following change:
> 
> Change:
> 
> "hashed using the octet sequence of its UTF8 encoding as 
> specified in the contents of the element". 
> 
> To
> 
> "hashed using the octet sequence of its UTF-8 encoding as 
> specified in the contents of the element including any whitespace"
> 
> The schema derives the <Created> element from xsd:string so I 
> don't believe that it is otherwise specified what a leading 
> space would mean and I believe that we should leave no doubts 
> in the minds of those implementing our spec.
> 
> -Eric

My point is that the meaning of white space *is* specified by XML:


a) No distinction is made between white space and any other character data in element content. That is, *every character* is significant. The one caveat is that conforming parsers MUST normalize line endings to the linefeed character U+000A before passing character data to the application.

b) White space inside attribute values is transformed and compressed - whatever the input looks like, conforming parsers are required to i)turn all other white space characters to spaces (U+0020) and ii) compress runs of spaces to a single space, before passing the attribute value to the application.

c) In DTDs (I'm not sure about XML Schema or other proposals) it's possible to specify that whitespace *between* elements is insignificant. In that case, for example, given input as below:

<fee>
  <foo>bar</foo>
  <baz>fumble</baz>
</fee>

a conforming, *validating* parser would not pass the newlines and spaces between <fee> and <foo>, </foo> and <baz>, or </baz> and </fee> to the application.



It is important that everybody implementing XML processors understand that they MUST NOT arbitrarily reformat XML documents. Don't pretty-print, don't compress runs of white space, don't convert tabs to spaces or vice versa.

Here http://lists.oasis-open.org/archives/security-services/200202/msg00018.html is the detailed proposal I wrote up for the SAML spec on how to compare strings; it has references to some of the W3C and Unicode related specs.

 - irving -

Follow-Ups:
- Re: [wss] White space (was RE: [wss] Proposal for a new attribute of UsernameToken)
  - From: merlin <merlin@baltimore.ie>