office message

Subject: [OASIS Issue Tracker] Created: (OFFICE-2681) Should openformulaevaluators be *required* to support BMP or all Unicode/10646 characters?

From: OASIS Issues Tracker <workgroup_mailer@lists.oasis-open.org>
To: office@lists.oasis-open.org
Date: Mon, 10 May 2010 12:08:24 -0400 (EDT)

Should openformula evaluators be *required* to support BMP or all Unicode/10646 characters?
-------------------------------------------------------------------------------------------

                 Key: OFFICE-2681
                 URL: http://tools.oasis-open.org/issues/browse/OFFICE-2681
             Project: OASIS Open Document Format for Office Applications (OpenDocument) TC
          Issue Type: Bug
          Components: OpenFormula
            Reporter: David Wheeler 
            Assignee: David Wheeler 


Part 2 (OpenFormula) section 3.2 "Text:" says:
"A text value (also called a string value) is a sequence of zero or more characters.  
Evaluators should accept [UNICODE] strings, but shall accept strings of ASCII (Unicode U+0020 through U+007F, inclusive) characters."

Some commenters on the open comment list believe there should be a stronger requirement:
 http://lists.oasis-open.org/archives/office-comment/201005/msg00002.html

Certainly from a *user* point of view a stronger requirement would be nice.

Two basic questions:
1. Should the required character set supported by the evaluator at run-time be increased,
    and if so, to what (BMP or all Unicode/10646)?
2. Under what conditions should they be increased?
    (All implementations? Only those of medium group or up?
     Maybe require BMP in medium group, and all characters in large group?)

This is related to:
 http://tools.oasis-open.org/issues/browse/OFFICE-2663

I'd like implementors to briefly respond with comments to *THIS* JIRA comment,
noting what they can support and if there are major "gotchas".
For example, can everyone support evaluating BMP or all Unicode characters
at formula runtime *regardless* of the user's locale setting
(I'm concerned this may be an issue for Excel)?
Can everyone handle arbitrary characters, or is anyone limited to BMP
(our 16-bit-char friends can end up with this problem)?
If anyone is limited, is this a stumblingblock?

I have a particular concern for the implementations that use 16-bit-chars internally.
If you're given a character that is not in the BMP, what do FIND, LEFT, etc. do?
Do they simply presume (incorrectly) that all chars are in the BMP, and thus you can
cut out have a character?  Or do they count "correctly" to the right character?

Systems that use UTF-8 internally presumably do this correctly, since they
have to "count" to get to the right characters anyway, but I'd like to know if that's
NOT true for anyone.


-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://tools.oasis-open.org/issues/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

Follow-Ups:
- [OASIS Issue Tracker] Issue Comment Edited: (OFFICE-2681) Shouldopenformula evaluators be *required* to support BMP or all Unicode/10646characters?
  - From: OASIS Issues Tracker <workgroup_mailer@lists.oasis-open.org>
- [OASIS Issue Tracker] Commented: (OFFICE-2681) Should openformulaevaluators be *required* to support BMP or all Unicode/10646 characters?
  - From: OASIS Issues Tracker <workgroup_mailer@lists.oasis-open.org>