office message

Subject: [OASIS Issue Tracker] Commented: (OFFICE-2672) NEEDS-DISCUSSIONPublic Comment: Text in OpenFormula - inadequate for international use

From: OASIS Issues Tracker <workgroup_mailer@lists.oasis-open.org>
To: office@lists.oasis-open.org
Date: Mon, 10 May 2010 12:29:24 -0400 (EDT)


    [ http://tools.oasis-open.org/issues/browse/OFFICE-2672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=19191#action_19191 ] 

David Wheeler  commented on OFFICE-2672:
----------------------------------------

Part 2 (OpenFormula) section 3.2 "Text:" currently says:
"A text value (also called a string value) is a sequence of zero or more characters.
Evaluators should accept [UNICODE] strings, but shall accept strings of ASCII (Unicode U+0020 through U+007F, inclusive) characters."

Certainly from a *user* point of view a stronger requirement would be nice.

Two basic questions:
1. Should the required character set supported by the evaluator at run-time be increased,
    and if so, to what (BMP or all Unicode/10646)?
2. Under what conditions should they be increased?
    (All implementations? Only those of medium group or up?
     Maybe require BMP in medium group, and all characters in large group?)

This is related to:
 http://tools.oasis-open.org/issues/browse/OFFICE-2663

I'd like implementors to briefly respond with comments to *THIS* JIRA comment,
noting what they can support and if there are major "gotchas".
For example, can everyone support evaluating BMP or all Unicode characters
at formula runtime *regardless* of the user's locale setting
(I'm concerned this may be an issue for Excel)?
Can everyone handle arbitrary characters, or is anyone limited to BMP
(our 16-bit-char friends can end up with this problem)?
If anyone is limited, is this a stumblingblock?

I have a particular concern for the implementations that use 16-bit-chars internally.
If you're given a character that is not in the BMP, what do FIND, LEFT, etc. do?
Do they simply presume (incorrectly) that all chars are in the BMP, and thus you can
cut out have a character? Or do they count "correctly" to the right character?

Systems that use UTF-8 internally presumably do this correctly, since they
have to "count" to get to the right characters anyway, but I'd like to know if that's
NOT true for anyone.


> NEEDS-DISCUSSION Public Comment: Text in OpenFormula - inadequate for international use
> ---------------------------------------------------------------------------------------
>
>                 Key: OFFICE-2672
>                 URL: http://tools.oasis-open.org/issues/browse/OFFICE-2672
>             Project: OASIS Open Document Format for Office Applications (OpenDocument) TC
>          Issue Type: Bug
>          Components: OpenFormula
>    Affects Versions: ODF 1.2 Part 2 CD 2
>            Reporter: Robert Weir 
>            Assignee: Andreas Guelzow 
>
> Copied from office-comment list
> Original author: Alex Brown <alexb@griffinbrown.co.uk> 
> Original date: 5 May 2010 10:45:54 -0000
> Original URL: http://lists.oasis-open.org/archives/office-comment/201005/msg00002.html

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://tools.oasis-open.org/issues/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

References:
- [OASIS Issue Tracker] Created: (OFFICE-2672) Public Comment: Textin OpenFormula - inadequate for international use
  - From: OASIS Issues Tracker <workgroup_mailer@lists.oasis-open.org>