OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

oic message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Whitespace in ODF


Another interesting point to test/discuss on a plugfest might be the whitespace handling in ODF:
http://docs.oasis-open.org/office/v1.2/os/OpenDocument-v1.2-os-part1.html#White-space_Characters

AFAIK this whitespace handling was introduced to allow to indent the XML, without altering the layout of the document to the user.

The processing described by the reference above, can be summarized like this:
  1. The complete text from a paragraph (or heading) and its ancestor ODF elements is being extracted and TABS, LF, CR are being replaced by a SPACE.
  2. All leading/trailing spaces are being removed
  3. Within the text multiple spaces are being replaced by a single character.

Interesting that the many applications do believe that a TAB <text:tab/> and SPACE <text:s/> already count as a valid text character, where removal of leading/trailing space can be stopped. Unfortunately this leads to different loaded content in the applications.
Another problematic scenarios occurs, when there are spaces around an element (e.g. span) within the text. Often both spaces are being kept, which is against the spec. Some may argue, it is not clearly specified where the single replaced space have will be remaining, but it is obvious an issue not to drop a single space. From a programmer perspective it is likely to keep the first one and ignore all following, e.g. during parsing.

I attached a very very rudimentary test document to this mail.

Regards,
Svante

Attachment: WhitespaceTest.odt
Description: application/vnd.oasis.opendocument.text



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]