[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: Re: [office] Fwd: ODF spec question (white-space processing)
Hi Bruce, You wrote: >> I am curious as to whether those same people would expect the >> following to not have any whitespace, too: >> >> <text:p> >> <text:span> >> My paragraph text >> </text:span> >> </text:p> > >Based on today's discussion, wouldn't there be a single space (the >collapsed one from the span)? And so I take it you are pointing out an >inconsistency? Or maybe I'm just lost ... No, you're not lost. And, no, seriously, I don't want to re-open the issue. If the two major implementations agree, I doesn't make sense for me to get in the way. The consensus, as I understand, is: - Whitespace characters are being collapsed. - If this leaves leading/trailing whitespace in a paragraph, that whitespace is remove as well. The last rule (explicitly!) treats the paragraph tags somewhat different from the rest. That's what I pointed out above. Since David showed an example where an unsuspecting engineer made use of exactly that second part, I was (and still am) curious of what that same engineer would suspect for the extended example. >So can someone recount why this (collapse vs. remove) is important? I'll try: IMHO, the only REALLY important thing is that the spec is clear and unambigious, regardless of the actual resolution. Apparently it wasn't, since different people reading the same text came to different conclusions. This will be resolved. The next and not quite as important thing is: What are the actual rules? The idea with OpenDocument has been to do it more or less like HTML does it, which by and large ignores whitespace. In particular, in HTML 'pretty printed' text looks the same as regular text. Example: 1) <p>Hello World!</p> 2) <p>Hello World!</p> 3) <p> Hello World! </p> In HTML, these three should look pretty much the same. (OOo has a mode to optionally use pretty printing, which is quite useful for debugging purposes.) Now, where is the problem? The first problem is: The formal definition of what HTML does has changed over HTML's history. The second problem is: HTML 4 (and XHTML 1) define this purely by layout. So in HTML 4 those three example paragraphs actually have different lengths. (12, 15, and 18 characters, respectively) But they LOOK the same. HTML gets away with this because it's primarily an output language. OpenDocument isn't. In OpenDocument, different lengths (but with identical layout) would e.g. mean that you can place the cursor on all those different positions. Which is why OpenDocument uses whitespace compression. In other words, OD does at CONTENT level what HTML does at LAYOUT level. HTML has multiple whitespaces, but shows them as one/none. OpenDocument reduces multiple whitespaces to one (or none). That's fine by me, it just makes it pretty much impossible to define the one by referencing the other. So, the third problem is: What does the current spec with its HTML 4 reference actually say? Well, today's meeting discussed what it should say instead, which is the summary I posted above. The only real difference between the updated definition and (my) interpretation of the previous one: The new one emulates real-life HTML behavior a little more closely: Example 3) would lead to " Hello World! " under my previous understanding, and will yield "Hello World!" with the updated definition. Well... I hope that clears any confusion. Admittedly, I'm not quite so sure after re-reading my post... :-P Sincerely, Daniel
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]