OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

topicmaps-comment message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]


Subject: [xtm-wg] XTM Whitespace Handling


Quoting (just as an example) the documentation from org.apache.xml.serialize:

>   [...]
>   For elements that are not specified as whitespace preserving,
>   the serializer will potentially break long text lines at space
>   boundaries, indent lines, and serialize elements on separate
>   lines. Line terminators will be regarded as spaces, and
>   spaces at beginning of line will be stripped.

This particular Java class might be typically used to serialize a parsed
XTM document back into XML syntax. It's only typical; almost every XML
application I've seen has some sort of functionality similar to this one.

The reason I bring this up is that I'm concerned that we've underspecified
whitespace handling. It's an area that seems to bite everybody. We have 
PCDATA in two places -- <baseNameString> and <resourceData> -- and we've 
declared it all significant in the prose of the spec (I believe), yet not
done anything in the DTD to recommend to XML applications that it be 
preserved. The XML feature that does this the "xml:space" attribute 
defaulted on an element to "preserve". I've had to modify local copies of 
the XTM DTD in order to keep the apache serializer from altering the 
whitespace in my documents.

The big question really is:  is whitespace significant in XTM documents? 
Do the base names

  "Niagara Falls" 
  "  Niagara Falls"
  "Niagara
Falls"
  "Niagara Falls "
  "Niagara<tab>Falls"

all match? If we don't actually alter anything in the XTM Specification,
we ought to at least give application developers a clue on how we think
this should be handled. I seem to remember us discussing this at one
point, but can't remember the outcome. And I'm certain a public airing
of this issue would benefit other XTM developers.

My recommendation (which I'm certainly open to discussing) would be to
add 

   xml:space  (default|preserve)  'preserve'

to those elements which we explicitly state that whitespace *is* 
significant. XML parsers will pass all whitespace, but after that,
XML applications can do what they want. Since we might expect that
XTM documents go through various processing stages, shouldn't we
do something about this?

And if I'm totally wrong about this, could somebody give *me* the clue?

Murray

...........................................................................
Murray Altheim                            <mailto:altheim&#x40;eng.sun.com>
XML Technology Center
Sun Microsystems, Inc., MS MPK17-102, 1601 Willow Rd., Menlo Park, CA 94025

      In the evening
      The rice leaves in the garden
      Rustle in the autumn wind
      That blows through my reed hut.  -- Minamoto no Tsunenobu

------------------------ Yahoo! Groups Sponsor ---------------------~-~>
Make good on the promise you made at graduation to keep
in touch. Classmates.com has over 14 million registered
high school alumni--chances are you'll find your friends!
http://us.click.yahoo.com/n4HqaC/DMUCAA/4ihDAA/2n6YlB/TM
---------------------------------------------------------------------_->

To Post a message, send it to:   xtm-wg@eGroups.com

To Unsubscribe, send a blank message to: xtm-wg-unsubscribe@eGroups.com 

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/ 




[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]


Powered by eList eXpress LLC