OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

office message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]

Subject: [OASIS Issue Tracker] Commented: (OFFICE-2207) Whitespaceprocessing [N 1309]

    [ http://tools.oasis-open.org/issues/browse/OFFICE-2207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=20211#action_20211 ] 

Dennis Hamilton commented on OFFICE-2207:

After going around a fair amount in discussion of this, I offered the following proposal for consideration in an off-list discussion (on Monday, 2010-07-26):

The text is not the exact text used, but the rationale remains the same:

Here is my proposal for a complete replacement for section 1.6 of ODF 1.0 (and thereby IS 26300 and ODF 1.1).  A replacement can be made in ODF 1.2 as well:

[1] "1.6 White-Space Processing"

[2] "ODF processing of whitespace characters is in conformance with the provisions of [XML 1.0].

[3] "In addition, ODF processors shall ignore all element children ([RNG] section 5, Data Model) of ODF-defined elements that are strings consisting entirely of whitespace characters and that do not satisfy a pattern of the ODF schema definition for the element.  

[4] "Any special treatment of additional occurrences of whitespace characters depends on the specific definitions of individual ODF elements, attributes, and datatypes. See, in particular, section 5.1.1." 


1. There is no reason to use "EOL" or introduce the notion.  The definition of whitespace characters and the rules for collapsing CR-LF sequences are already dictated by the XML processing at [2].

2. The processing of whitespace characters specified by [XML 1.0] happens first, being performed in-effect by an XML processor that delivers its results to the ODF processor as an XML application.

3. The specific rule for ignoring all-whitespace element children (in the data model) when they do not satisfy the pattern for the element is a requirement of ODF as an application of XML.  It is the RNG Schema definitions for those elements that determine whether element-children that are strings are accepted or not.  (There is an edge case where an element instance that has a non-whitespace text pattern is now seen as empty.  It should properly fail to be accepted depending on whether the element is allowed to be empty or not.) 

4. I replaced the "for example" phrase because it creates confusion with examples and non-normative text.  There might be a preferable restatement.  There are numerous special treatments for valid white-space occurrences that remain after [2-3] are applied.  For example: in white-space-separated lists, in the W3C Schema definitions for anyURI and base64Binary, in attributes whose values are formulas that include string constants, and in the special case where consecutive white-space sequences are treated as occurrences of single space characters in the HTML-like processing of text streams for layout in ODF documents.  This statement [4] allows for all of those situations.  

 - Dennis

PS: As far as Svante and I know, only 5.1.1 has a special rule for elements that deliver text to the layout of the document.  

Technically, I think it might have been better to describe this as part of a text layout definition rather than treating it as something that happens in the consumption/interpretation of the document.  This could also define what happens to white space that trails introduction of a (soft) line break, for example, along with the treatment of   and other character entities and special ODF elements that affect layout.  It also avoids statements about elements that deliver text that is not necessarily delivered into the layout stream, such as material in the change-tracking history.  (I note that the showing of tracked changes in the layout can reveal white space introduced adjacent to where white space has been removed.) 

> Whitespace processing [N 1309]
> ------------------------------
>                 Key: OFFICE-2207
>                 URL: http://tools.oasis-open.org/issues/browse/OFFICE-2207
>             Project: OASIS Open Document Format for Office Applications (OpenDocument) TC
>          Issue Type: Bug
>          Components: General
>    Affects Versions: ODF 1.0, ODF 1.0 (second edition)
>            Reporter: Robert Weir 
>            Assignee: Svante Schubert 
>             Fix For: ODF 1.0 Errata CD 5
> Submitter ID
>     GB-26300-34
> Nature of defect
>     Technical
> Document
>     ISO/IEC 26300:2006
> Clause
>     1.6
> Page
>     34
> Description of issue
> It is stated that "In conformance with the W3C XML specification [XML1.0], optional white-space characters that are contained in elements that have element content (in other words that must contain elements only but not text) are ignored".
>     * It is not clear what "optional white-space characters" are (the term is not defined in XML 1.0), or how the described behaviour conforms to XML 1.0.
>     * Does the phrase "elements that have element content" mean elements that have only element content? This cannot make sense, as whitespace is itself text content.
>     * Consider the markup <text:p><text:span>Hello</text:span> <text:span>world</text:span></text:p>. If processed according to the text above, the space between the words here would be ignored, yet no known ODF processor actually respects this provision.
> Proposal
> Reform the text to answer the above queries and modify the stated processing behaviour to accord with the existing corpus of documents and processors.

This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators: http://tools.oasis-open.org/issues/secure/Administrators.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]