OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

office message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: [OASIS Issue Tracker] Issue Comment Edited: (OFFICE-2207)Whitespace processing [N 1309]



    [ http://tools.oasis-open.org/issues/browse/OFFICE-2207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=19365#action_19365 ] 

Dennis Hamilton edited comment on OFFICE-2207 at 6/5/10 1:13 PM:
-----------------------------------------------------------------

In reviewing the Errata CD04-rev02 resolution of this, and looking at the original defect text, I think we may be confusing the issue by saying too much about what [XML 1.0] is thought to say, rather than simply referencing what it is that [XML 1.0] says precisely.  

I'm not sure where this should be handled, but I want to record these observations here so they are not lost in the context of Errata discussion:

 1. I think the distinction is between when white-space characters appear in element-content and when white-space characters appear in the PCDATA of mixed content.  (In [XML 1.0], element-content is element-only content.)

 2. In [XML 1.0], it is presumed that all character data that occurs in the content of the root element, directly or indirectly, is character data of the XML document.  

 3. How white-space characters are handled in element-content is determined by the application of [XML 1.0].  The xml:space="preserve" attribute applies, but the attribute value is a recommendation here.  

  3.1 We can simply say that white-space characters encountered as character data in the immediate content of elements having element-content shall be ignored and any setting of xml:space has no effect for those particular occurrences.  

  3.2 I do not see anywhere that [XML 1.0] says such white-space characters are to be ignored.  Instead it is specified that the application be informed which white-space characers are of this kind.  If we want them ignored, we need to make it our rule. [<b>Update 2010-06-05T17:07Z</b> It is more involved than this.  A non-validating XML process does not distinguish because it can't, since it has no way of determining whether encountered white space is that encountered in the syntax of element content or is part of PCDATA in mixed content.  That simply affirms that the rules described here are not rules of [XML 1.0].

4. With regard to how mixed-content white-space characters are handled by default, it may be better simply to refer to section 5.1 (or 5.1.1).  It is also valuable to assert that xml:space does not over-ride the specified behavior in any case.

5. Since there is no change being made in the [XML 1.0] rules for treatment of white-space in attribute values, and for elimination of carriage-return characters, it is not necessary to say anything about that.  

   5.1 The statement that "conforming to the XML specification, all possible line-ending cases are ..." can be misleading."  It is preferable to simply observe that the only way that a carriage-return character, #xd, can occur as character data is via a character entity, and that should probably be in a note that refers to [XML 1.0] for that fact.   This should be before the statement that makes reference to section 5.1 of ODF 1.0. [<b>Update 2010-06-05T17:07Z</b> clarified the sentence about carriage-return character occurrences.]

  5.2 Since the treatment of attribute values is mandatory, section 3.3.3 of [XML 1.0] might be referenced, but probably in a note that serves as a reminder, rather than appearing as normative content.

6. We should make sure this is cleaned up in ODF 1.2 regardless of handling this in ODF 1.0 Errata CD05 or later.

      was (Author: orcmid):
    In reviewing the Errata CD04-rev02 resolution of this, and looking at the original defect text, I think we may be confusing the issue by saying too much about what [XML 1.0] is thought to say, rather than simply referencing what it is that [XML 1.0] says precisely.  

I'm not sure where this should be handled, but I want to record these observations here so they are not lost in the context of Errata discussion:

 1. I think the distinction is between when white-space characters appear in element-content and when white-space characters appear in the PCDATA of mixed content.  (In [XML 1.0], element-content is element-only content.)

 2. In [XML 1.0], it is presumed that all character data that occurs in the content of the root element, directly or indirectly, is character data of the XML document.  

 3. How white-space characters are handled in element-content is determined by the application of [XML 1.0].  The xml:space="preserve" attribute applies, but the attribute value is a recommendation here.  

  3.1 We can simply say that white-space characters encountered as character data in the immediate content of elements having element-content shall be ignored and any setting of xml:space has no effect for those particular occurrences.  

  3.2 I do not see anywhere that [XML 1.0] says such white-space characters are to be ignored.  Instead it is specified that the application be informed which white-space characers are of this kind.  If we want them ignored, we need to make it our rule.

4. With regard to how mixed-content white-space characters are handled by default, it may be better simply to refer to section 5.1 (or 5.1.1).  It is also valuable to assert that xml:space does not over-ride the specified behavior in any case.

5. Since there is no change being made in the [XML 1.0] rules for treatment of white-space in attribute values, and for elimination of carriage-return characters, it is not necessary to say anything about that.  

   5.1 The statement that "conforming to the XML specification, all possible line-ending cases are ..." can be misleading."  To simply say that the only way that a carriage-return character, #xd, can occur as character data is via a character entity .   This should be before the statement that makes reference to section 5.1 of ODF 1.0.

  5.2 Since the treatment of attribute values is mandatory, section 3.3.3 of [XML 1.0] might be referenced, but probably in a note that serves as a reminder, rather than appearing as normative content.

6. We should make sure this is cleaned up in ODF 1.2 regardless of handling this in ODF 1.0 Errata CD05 or later.
  
> Whitespace processing [N 1309]
> ------------------------------
>
>                 Key: OFFICE-2207
>                 URL: http://tools.oasis-open.org/issues/browse/OFFICE-2207
>             Project: OASIS Open Document Format for Office Applications (OpenDocument) TC
>          Issue Type: Bug
>          Components: General
>    Affects Versions: ODF 1.0, ODF 1.0 (second edition)
>            Reporter: Robert Weir 
>            Assignee: Svante Schubert 
>             Fix For: ODF 1.0 Errata CD 5
>
>
> Submitter ID
>     GB-26300-34
> Nature of defect
>     Technical
> Document
>     ISO/IEC 26300:2006
> Clause
>     1.6
> Page
>     34
> Description of issue
> It is stated that "In conformance with the W3C XML specification [XML1.0], optional white-space characters that are contained in elements that have element content (in other words that must contain elements only but not text) are ignored".
>     * It is not clear what "optional white-space characters" are (the term is not defined in XML 1.0), or how the described behaviour conforms to XML 1.0.
>     * Does the phrase "elements that have element content" mean elements that have only element content? This cannot make sense, as whitespace is itself text content.
>     * Consider the markup <text:p><text:span>Hello</text:span> <text:span>world</text:span></text:p>. If processed according to the text above, the space between the words here would be ignored, yet no known ODF processor actually respects this provision.
> Proposal
> Reform the text to answer the above queries and modify the stated processing behaviour to accord with the existing corpus of documents and processors.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://tools.oasis-open.org/issues/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]