office message

Subject: Re: [office] white-space processing proposal

From: Michael Brauer - Sun Germany - ham02 - Hamburg <Michael.Brauer@Sun.COM>
To: Dave Pawson <dave.pawson@gmail.com>
Date: Mon, 18 Sep 2006 12:52:03 +0200

Dave Pawson wrote:

Thank you very much for your feedback. I've integrated that into the 
following revised proposal. Some more comments are below:

Change

"If the paragraph element or any of its child elements contains white-space 
characters, they are collapsed, in other words they are processed in the same 
way that [HTML4] processes them. The following [UNICODE] characters are 
normalized to a SPACE character:"

to

"If the paragraph element or any of its child elements contains white-space 
characters, they are collapsed. Leading white-space characters at the 
pragraph start as well as trailing white-space characters at the paragraph 
end are ignored. In detail, the following conversions take place:

The following [UNICODE] characters are normalized to a SPACE character:"

Behind the paragraph starting

"In addition, these characters are ignored if the preceding character is a 
white-space character."

add

"White-space characters at the start or end of the paragraph are ignored, 
regardless whether they are contained in the paragraph element itself, or in 
a child element in which white-space characters are collapsed as described above.

These white-space processing rules shall enable authors to use white-space 
characters to improve the readability of the XML source of an OpenDocument 
document in the same way as they can use them in HTML."

> On 18/09/06, Michael Brauer - Sun Germany - ham02 - Hamburg
> 
>> "If the paragraph element or any of its child elements contains 
>> white-space
>> characters, they are collapsed. Leading white-space characters at the
>> pragraph start as well as trailing white-space characters at the 
>> paragraph
>> end are ignored. The following [UNICODE] characters are normalized to 
>> a SPACE
>> character:"
> 
> 
> 1. Under what conditions does this happen, is it only when a document
> is displayed?

It is at least when the document is displayed. We make no assumption about 
the data models that ODF applications use internally, so we also don't make 
any assumption what happends where.

> 2. Is this visual presentation only?

See above.

> 3. Is this whitespace processing permanent, i.e. is the source file 
> modified?

This depends on the application. All Word processors I know don't keep the 
source code, and don't operate on an XML model. They create the XML source 
code from scratch again if a document is saved. They therefore may even 
insert new white-space characters to make the XML source look nice.

> (If so, can we state that ODF is an xml application?   see
> http://www.w3.org/TR/xml11/#sec-white-space )

I think you mean "xml processor". If so: No, ODF is not an xml processor. It 
is an application (see http://www.w3.org/TR/xml11/#sec-intro)

> 4. Definition of collapse please?
> Could use http://www.w3.org/TR/xml11/#AVNormalize if that is what is meant,
> or do you mean removed?
> 5. Definition of normalize (suggest 
> http://www.w3.org/TR/xml11/#AVNormalize )

The terms "collapse" and "normalize" are not used in as formal definitions 
here, but as English words only. A definition what happens is following.

> 
> 
> 
>> These white-space processing rules shall enable authors to use 
>> white-space
>> characters to improve the readability of the XML source of an 
>> OpenDocument
>> document in the same way as they can use them in [HTML4]."
> 
> 
> Is the reference to  the HTML specs necessary/helpful?

Yes, I think so. A reference to HTML makes it easier to understand what the 
rules are, and allows authors to re-use their experience with HTML. What we 
may do is to write HTML instead of [HTML4].

> Is there any conflict with the HTML4 that could cause a dispute?

I don't think so, but if we write "HTML" instead of "[HTML4]" we should be on 
the safe side.
> 
> Why is this only applicable to a paragraph element, and not to list 
> content,
> table cells etc? I.e. all CDATA content.

List and table cells contain paragraphs, so the rules apply there as well.

> 
> 
> regards
> 
> 

Michael

Follow-Ups:
- Re: [office] white-space processing proposal
  - From: "Dave Pawson" <dave.pawson@gmail.com>

References:
- white-space processing proposal
  - From: Michael Brauer - Sun Germany - ham02 - Hamburg <Michael.Brauer@Sun.COM>
- Re: [office] white-space processing proposal
  - From: "Dave Pawson" <dave.pawson@gmail.com>