OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

office message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [office] Fwd: ODF spec question (white-space processing)


Hello,

Sorry I'm a bit late with this, but I had some trouble with the email
list.

>David Faure wrote:
>> In 5.1.1 (page 84) it specifies that extra white space characters are 
>> ignored.
>> I read this to be about 
>> - more then one literal consecutive whitespace
>> - any literal whitespace following a text:c or text:tab element.
>> 
>> OOo adds a case that I don't agree with:
>> - any whitespace after an opening text:p tag.
>> 
>> So  <text:p>         foo</text:p>
>> will have only one word and zero spaces in Writer.
>> I expect it to have 1 space and one word.

Me too.

Michael Brauer wrote:
>The correct interpretation is to ignore white space characters at the 
>beginning of the paragraph, as OOo does. The explanation for this is in 
>section 5.1.1, first paragraph
>
>"If the paragraph element or any of its child elements contains white-space 
>characters, they are collapsed, in other words they are processed in the same 
>way that [HTML4] processes them."
>
>HTML ignores white space characters behind the start element tag, 

Actually, HTML does no such thing. Neither the HTML spec nor actual
HTML browsers remove existing whitespace behind a start element tag.
By extension, neither does the OpenDocument spec. So I would think
Thomas'/David's interpretation of the spec is correct.


The HTML 4.01 spec (which is the one referenced from the OpenDocument
spec) describes in chapter 9.2 ("White space") the handling of white
space. It defines what white space is, that it seperates words, and
that such words should be layed out according to the conventions of
the particular language. This indeed achieves white space compression,
but by defining that the LAYOUT should only look at the words, not at
the white space.

Additionally, HTML optionally (!) allows whitespace just after/before
to be ignored FOR LAYOUT. (Apparently, this is a legacy thing from
older HTML versions.) If we really wish to be compatible to this
behaviour, we should extend the OpenDocument spec to include a
formatting property that determines whether such whitespace is taking
into account by the layout. (Where it would naturally apply to
<text:s> as well as to lexical white space.)

For all I can see, neither HTML nor the OpenDocument spec allow simply
removing such whitespace from the content. As such, the OOo behaviour
is not conforming to the spec.


On the spec itself:
>"If the paragraph element or any of its child elements contains white-space 
>characters, they are collapsed, in other words they are processed in the same 
>way that [HTML4] processes them."

I think this paragraph should be changed. It is confusing, because the
OpenDocument spec describes how the content of the file looks like,
and the referenced HTML4 spec describes white space in terms of
layout. That doesn't really make any sense, particularly given that
the OpenDocument spec cleanly seperates layout from content pretty
much anywhere else.

(HTML 4 gets away with it, because HTML is mainly used as a display
language and such it may define things in terms of their layout.
OpenDocument must be suitable for editing programs and thus needs to
be a lot more specific about its content. After all, that was one of
the original reasons for doing a new format at all, rather than just
piling on top of HTML or FO.)

Note that even if the HTML 4 spec would mandate the dropping of
whitespace after the beginning of elements, this sentence would still
be a problem because the first part describes one behaviour, while the
'in other words' part is supposed to describe the same thing but under
that assumption simply wouldn't. There would be no way for a spec
reader to determine the correct approach.


Sincerely,
Daniel


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]