OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

office message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]

Subject: Re: [office] Fwd: ODF spec question (white-space processing)

Hi Bruce,

You wrote:
>> I am curious as to whether those same people would expect the
>> following to not have any whitespace, too:
>>   <text:p>
>>     <text:span>
>>         My paragraph text
>>     </text:span>
>>   </text:p>
>Based on today's discussion, wouldn't there be a single space (the 
>collapsed one from the span)? And so I take it you are pointing out an 
>inconsistency? Or maybe I'm just lost ...

No, you're not lost.

And, no, seriously, I don't want to re-open the issue. If the two
major implementations agree, I doesn't make sense for me to get in the

The consensus, as I understand, is:
- Whitespace characters are being collapsed.
- If this leaves leading/trailing whitespace in a paragraph, that
whitespace is remove as well.

The last rule (explicitly!) treats the paragraph tags somewhat
different from the rest. That's what I pointed out above. Since David
showed an example where an unsuspecting engineer made use of exactly
that second part, I was (and still am) curious of what that same
engineer would suspect for the extended example.

>So can someone recount why this (collapse vs. remove) is important?

I'll try:

IMHO, the only REALLY important thing is that the spec is clear and
unambigious, regardless of the actual resolution. Apparently it
wasn't, since different people reading the same text came to different
conclusions. This will be resolved.

The next and not quite as important thing is: What are the actual
rules? The idea with OpenDocument has been to do it more or less like
HTML does it, which by and large ignores whitespace. In particular, in
HTML 'pretty printed' text looks the same as regular text.

  1) <p>Hello World!</p>
  2) <p>Hello
  3) <p>

In HTML, these three should look pretty much the same.

(OOo has a mode to optionally use pretty printing, which is quite
useful for debugging purposes.)

Now, where is the problem?

The first problem is: The formal definition of what HTML does has
changed over HTML's history. 

The second problem is: HTML 4 (and XHTML 1) define this purely by
layout. So in HTML 4 those three example paragraphs actually have
different lengths. (12, 15, and 18 characters, respectively) But they
LOOK the same.

HTML gets away with this because it's primarily an output language.
OpenDocument isn't. In OpenDocument, different lengths (but with
identical layout) would e.g. mean that you can place the cursor on all
those different positions. Which is why OpenDocument uses whitespace
compression. In other words, OD does at CONTENT level what HTML does
at LAYOUT level. HTML has multiple whitespaces, but shows them as
one/none. OpenDocument reduces multiple whitespaces to one (or none).

That's fine by me, it just makes it pretty much impossible to define
the one by referencing the other.

So, the third problem is: What does the current spec with its HTML 4
reference actually say? Well, today's meeting discussed what it should
say instead, which is the summary I posted above.

The only real difference between the updated definition and (my)
interpretation of the previous one: The new one emulates real-life
HTML behavior a little more closely: Example 3) would lead to " Hello
World! " under my previous understanding, and will yield "Hello
World!" with the updated definition.

Well... I hope that clears any confusion. Admittedly, I'm not quite so
sure after re-reading my post... :-P


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]