OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

office-formula message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]

Subject: Summary of 2009-10-13 teleconference

Here's my informal summary of the 2009-10-13 teleconference.

Most of the time was spent discussing runtime character processing issues (LEFT and friends, LEFTB and friends, CHAR/CODE, UNICHAR/UNICODE, and so on).  This is complicated by many issues, which is why it's taken so much time (our last 2 meetings focused on this too).  One issue is that some implementations (such as Excel) store characters at run-time in a locally-set character set and character encoding, making exactly what you CAN do difficult to specify (e.g., a character may be legal in Unicode but have no equivalent in the locally-set character set/encoding).  Yet some people depend on these settings.   We can't just eliminate LEFT, etc.; according to one survey, LEFT() etc. are some of the most common functions.

Here is what I *believe* we agreed on:
* The "Text" (String) type will simply defined as something that can contain 0 or more "characters".  We will separately discuss what "character" means.  I personally think we should recommend, but not require, that implementations support a character set and encoding that permit any legal Unicode code point be a character, but we didn't discuss that today.
* We will need to add a discussion noting that implementations may have a specific character set and character encoding as a setting, and that this may limit which characters may be included in strings.  Do we need to have a way to STORE this information in an OpenDocument file? If so, how?
* LEFT(), RIGHT(), MID(), LEN(), thus have very simple meanings: They operate on characters.  An implementation that uses UTF-8 encodings, and treats each Unicode code point as a character, would thus return the number of CODE POINTS not BYTES to implement LEN(). Which is, I think, how it should be.
* The *B functions such as LEFTB, etc., are deprecated.
* CHAR() and CODE() are *not* deprecated... they stay in.  Instead, they "normalize" to ASCII values, regardless of the internal representation.  I believe this only affects those who use a particular Arabic encoding that uses 0...127 for non-ASCII characters, and only when using those functions.  This seems to be the only way to get an internationally-exchangeable interpretation of those functions.  Those very rare affected spreadsheets can switch to UNICHAR() and UNICODE().

Regarding the financial function comments still unhandled, in the next two weeks (1) Rob will make a last-ditch attempt to get a financial expert, and (2) Microsoft will comment on the ones relating to what Excel does.  But we can't wait forever; we will do what we can, and then allow for public comments on anything else.

Someone said: "If I didn't break it, I don't have to fix it".  We all laughed; we've spent many weeks trying to work out internationally-appropriate resolutions given legacy decisions.

Someone had a survey of "functions most in use".  The sample wasn't scientific, but it would probably be illuminating; Wheeler asked that the results be posted to the list.  (In particular, we'd like "Small" to include the common functions, or at least discuss any that are not.)

Please reply to all with corrections; thanks.

--- David A. Wheeler 

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]