office-formula message

Subject: RE: [office-formula] Implementation-defined, Unspecified, andUndefined behaviors in OpenFormula
From: Doug Mahugh <Doug.Mahugh@microsoft.com>
To: "David A. Wheeler" <dwheeler@dwheeler.com>, "robert_weir@us.ibm.com"<robert_weir@us.ibm.com>
Date: Fri, 12 Jun 2009 15:55:17 +0000
The current draft of OpenForumla describes what I think is a very powerful and useful concept which allows ODF to support both great interoperability across vendors AND allows customers to bring their legacy spreadsheets forward if they want to standardize on using ODF within their organization.   David Wheeler's explanation in response to my earlier question shows that he and others have already put a lot of great work into this idea, and I don't think we want to lose that.

In Section 2 (Conformance),  the current draft says: "... this specification discusses what is required for a document to assert that it is a portable document. A portable document shall only depend on the capabilities defined in this specification, and shall not depend on undefined or implementation-defined behavior."

In an earlier mail to the list,  Eric Patterson proposed strengthening this language a bit to say:

"A spreadsheet document (as opposed to a spreadsheet application) is defined as "portable" when it only depends on the capabilities defined in this specification, and does not depend on undefined or implementation-defined behavior.
And adding
"Applications may provide users with assistance (in the form of warning messages or other features) for the creation and editing of portable documents."

If OpenForumla switches to mandating a certain behavior in the cases where current implementations differs,  then a user will not be able to re-save an existing legacy spreadsheet as an ODF 1.2 file and expect the results to be the same as they were before within the application that originally created the file.    Some may argue that users should not want to re-save spreadsheets which preserve the implementation specific behaviors they had before, but all of our feedback from real customers tells us that this is exactly what they want.    If we remove this support from OpenFomula,  I think it will limit its adoption and usefulness.   To answer one of Rob's questions earlier in the discussion, that's a good reason to continue to allow implementation defined behavior.

At the same time, for newly created spreadsheets customers can use whatever features or assistance their application provides to make sure they avoid using non-portable features in their documents.   I can imagine an implementer creating a "strictly portable" mode where it simply becomes an error to write =2^3^2  rather than =2^(3^2).   But the decisions on exactly how to help users create portable documents depends very much on the applications UI design and philosophy and is out of scope for a file format spec.    If applications do a good job of this,  then over time the world's collection of spreadsheet files will become more and more portable.

FYI, I'll be traveling to The Hague over the weekend for the ODF  plugfest, so may be unresponsive on email for a couple of days.  Looking forward to seeing the other TC members who will be there.

Regards,
Doug

-----Original Message-----
From: David A. Wheeler [mailto:dwheeler@dwheeler.com]
Sent: Friday, June 12, 2009 6:47 AM
To: robert_weir@us.ibm.com
Cc: office-formula@lists.oasis-open.org
Subject: Re: [office-formula] Implementation-defined, Unspecified, and Undefined behaviors in OpenFormula

robert_weir@us.ibm.com wrote:
 > Then I wonder if we truly need to have all of these items be
 > implementation-defined?  Or to ask the question differently, would there
 > be tangible user benefit, in terms of increased interoperability, if
some
 > of these items were fully specified, knowing that some implementations
 > would then need to change their code in order to conform, and that they
 > would need to deal (perhaps with version-conditional logic) with legacy
 > documents?

"Aye, there's the rub".  If the cost of forcing exact equivalence
exceeds the benefits, then I believe that we should NOT force
unnecessary equivalence, and that "implementation-defined" is what we
SHOULD say.   For years we've been trying to eliminate
"implementation-defined" areas, so I'd be surprised if we could now come
to some agreement on many of these, but by all means let's talk.

There are many areas where we DO force specific interpretations, even
though spreadsheets differ, because they DO impact interoperability.
For example, several spreadsheet implementations' ordinary string
operations start counting positions at 0; others start counting at 1.
Neither is the "right answer", but failing to agree on a convention
would make nearly all the string operations non-interoperable.  So we
settled on using 1 as the starting number.  And so on.

But there are diminishing returns; at some point, it's better to give up
and leave some things implementation-defined.

 > However, in other areas, like what SUM() does with a empty argument list,

Here I think there is legitimate disagreement.  Some believe that this
should be Error.  On the other hand, it's perfectly reasonable to argue
that "0" should be the result; it's even mathematically clean.  I see no
benefit to pressing this issue; this construct simply doesn't happen in
normal spreadsheets.  Why do we want implementors to make changes to
their implementations if there would be no improvement in
interoperability?  I think here, the costs to change clearly exceed any
benefit to interoperability.

 > I'd like to see us come up with a good reason why it is a good thing to
 > have a feature be implementation-defined.  Saying "mathematicians
 > disagree" or "different implementations do different things" doesn't
sound
 > like a particularly good reason.

Why not?  If there's no "right" answer, and disagreement does not
significantly impact interoperability, then there's no benefit to trying
to find a right answer.  It all comes to down to cost vs. benefit.

 > That's the analysis I'd like to see:  What is the user benefit if we
 > eliminated these differences versus what would be the downside.
 >
 > The nice thing here is that any choice is defensible.  It is not like
any
 > of them are wrong.  We're not redefining the Gregorian calendar or
 > anything.  So I'd just pick one, based on the majority (or plurality if
 > that is the case) behavior. Or do whatever Excel does here, if that
makes
 > Doug happy.  I'd certainly have no hesitancy to add an "if" statement to
 > the Symphony code if it were necessary for us to accommodate this.

 > What do people think?  Is this an area that is worth cleaning up rather
 > than trying to standardize a snapshot of the legacy application mess?

I don't see it as a mess.  There are few examples, as you noted.

By the way, I skimmed through your spreadsheet.  "=3>=TRUE()" is NOT
necessarily an operation mixing types.  TRUE() may be a Number (it _IS_
a Number on OpenOffice.org, Lotus, Quattro Pro, and many others); when
Logical isn't a distinct type, they're the same.


--- David A. Wheeler


---------------------------------------------------------------------
To unsubscribe from this mail list, you must leave the OASIS TC that
generates this mail.  Follow this link to all your TCs in OASIS at:
https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php
References:
- Implementation-defined, Unspecified, and Undefined behaviors in OpenFormula
  - From: robert_weir@us.ibm.com
- Re: [office-formula] Implementation-defined, Unspecified, and Undefinedbehaviors in OpenFormula
  - From: "David A. Wheeler" <dwheeler@dwheeler.com>