[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: Re: [office] Fwd: ODF spec question (white-space processing)
Hello, Sorry I'm a bit late with this, but I had some trouble with the email list. >David Faure wrote: >> In 5.1.1 (page 84) it specifies that extra white space characters are >> ignored. >> I read this to be about >> - more then one literal consecutive whitespace >> - any literal whitespace following a text:c or text:tab element. >> >> OOo adds a case that I don't agree with: >> - any whitespace after an opening text:p tag. >> >> So <text:p> foo</text:p> >> will have only one word and zero spaces in Writer. >> I expect it to have 1 space and one word. Me too. Michael Brauer wrote: >The correct interpretation is to ignore white space characters at the >beginning of the paragraph, as OOo does. The explanation for this is in >section 5.1.1, first paragraph > >"If the paragraph element or any of its child elements contains white-space >characters, they are collapsed, in other words they are processed in the same >way that [HTML4] processes them." > >HTML ignores white space characters behind the start element tag, Actually, HTML does no such thing. Neither the HTML spec nor actual HTML browsers remove existing whitespace behind a start element tag. By extension, neither does the OpenDocument spec. So I would think Thomas'/David's interpretation of the spec is correct. The HTML 4.01 spec (which is the one referenced from the OpenDocument spec) describes in chapter 9.2 ("White space") the handling of white space. It defines what white space is, that it seperates words, and that such words should be layed out according to the conventions of the particular language. This indeed achieves white space compression, but by defining that the LAYOUT should only look at the words, not at the white space. Additionally, HTML optionally (!) allows whitespace just after/before to be ignored FOR LAYOUT. (Apparently, this is a legacy thing from older HTML versions.) If we really wish to be compatible to this behaviour, we should extend the OpenDocument spec to include a formatting property that determines whether such whitespace is taking into account by the layout. (Where it would naturally apply to <text:s> as well as to lexical white space.) For all I can see, neither HTML nor the OpenDocument spec allow simply removing such whitespace from the content. As such, the OOo behaviour is not conforming to the spec. On the spec itself: >"If the paragraph element or any of its child elements contains white-space >characters, they are collapsed, in other words they are processed in the same >way that [HTML4] processes them." I think this paragraph should be changed. It is confusing, because the OpenDocument spec describes how the content of the file looks like, and the referenced HTML4 spec describes white space in terms of layout. That doesn't really make any sense, particularly given that the OpenDocument spec cleanly seperates layout from content pretty much anywhere else. (HTML 4 gets away with it, because HTML is mainly used as a display language and such it may define things in terms of their layout. OpenDocument must be suitable for editing programs and thus needs to be a lot more specific about its content. After all, that was one of the original reasons for doing a new format at all, rather than just piling on top of HTML or FO.) Note that even if the HTML 4 spec would mandate the dropping of whitespace after the beginning of elements, this sentence would still be a problem because the first part describes one behaviour, while the 'in other words' part is supposed to describe the same thing but under that assumption simply wouldn't. There would be no way for a spec reader to determine the correct approach. Sincerely, Daniel
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]