OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

office message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]

Subject: [OASIS Issue Tracker] Created: (OFFICE-2681) Should openformulaevaluators be *required* to support BMP or all Unicode/10646 characters?

Should openformula evaluators be *required* to support BMP or all Unicode/10646 characters?

                 Key: OFFICE-2681
                 URL: http://tools.oasis-open.org/issues/browse/OFFICE-2681
             Project: OASIS Open Document Format for Office Applications (OpenDocument) TC
          Issue Type: Bug
          Components: OpenFormula
            Reporter: David Wheeler 
            Assignee: David Wheeler 

Part 2 (OpenFormula) section 3.2 "Text:" says:
"A text value (also called a string value) is a sequence of zero or more characters.  
Evaluators should accept [UNICODE] strings, but shall accept strings of ASCII (Unicode U+0020 through U+007F, inclusive) characters."

Some commenters on the open comment list believe there should be a stronger requirement:

Certainly from a *user* point of view a stronger requirement would be nice.

Two basic questions:
1. Should the required character set supported by the evaluator at run-time be increased,
    and if so, to what (BMP or all Unicode/10646)?
2. Under what conditions should they be increased?
    (All implementations? Only those of medium group or up?
     Maybe require BMP in medium group, and all characters in large group?)

This is related to:

I'd like implementors to briefly respond with comments to *THIS* JIRA comment,
noting what they can support and if there are major "gotchas".
For example, can everyone support evaluating BMP or all Unicode characters
at formula runtime *regardless* of the user's locale setting
(I'm concerned this may be an issue for Excel)?
Can everyone handle arbitrary characters, or is anyone limited to BMP
(our 16-bit-char friends can end up with this problem)?
If anyone is limited, is this a stumblingblock?

I have a particular concern for the implementations that use 16-bit-chars internally.
If you're given a character that is not in the BMP, what do FIND, LEFT, etc. do?
Do they simply presume (incorrectly) that all chars are in the BMP, and thus you can
cut out have a character?  Or do they count "correctly" to the right character?

Systems that use UTF-8 internally presumably do this correctly, since they
have to "count" to get to the right characters anyway, but I'd like to know if that's
NOT true for anyone.

This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators: http://tools.oasis-open.org/issues/secure/Administrators.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]