office message

Subject: [OASIS Issue Tracker] Commented: (OFFICE-1935) Review 1.2specification with respect to Unicode usage
From: OASIS Issues Tracker <workgroup_mailer@lists.oasis-open.org>
To: office@lists.oasis-open.org
Date: Tue, 27 Jul 2010 12:51:13 -0400 (EDT)

    [ http://tools.oasis-open.org/issues/browse/OFFICE-1935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=20053#action_20053 ] 

Dennis Hamilton commented on OFFICE-1935:
-----------------------------------------

It looks like this is something that could be partly addressed in a Nomenclature/Terminology section, with regard to character references in the text, and assumptions about how the XML documents use Unicode as the reference character set.  In particular, we use the U+nnnn and Unicode Names without saying what we are doing.

The handling of operations that involve text seem rather different, such as (2-3) in layout processing, (3-4) in comparisons and sorting, (6) for entered text where the encoding must be known, and (7) formula operations.

Should we subdivide this issue and have tasks to verify those other specific cases?

> Review 1.2 specification with respect to Unicode usage
> ------------------------------------------------------
>
>                 Key: OFFICE-1935
>                 URL: http://tools.oasis-open.org/issues/browse/OFFICE-1935
>             Project: OASIS Open Document Format for Office Applications (OpenDocument) TC
>          Issue Type: Task
>          Components: Locale, Text
>    Affects Versions: ODF 1.2 CD 05
>            Reporter: Robert Weir 
>            Assignee: Robert Weir 
>             Fix For: ODF 1.2 CD 06
>
>
> We should review the ODF 1.2 specification, in particular for the following:
> 1) Are all character literals specifying their code points, e.g., '1' (U+0030).  Remember, not every reader of the standard will be a native English speaker or even a native user of Latin-1 characters.  Since Unicode defines several characters that may look like a plus sign, or a dash, we need to be explicit.
> 2) Are we crystal clear on whitespace treatment?
> 3) Bidi?
> 4) Whenever we talk about sorting, are we clear on whether this is lexical or a locale-dependent collation order?
> 5) What Unicode version? 
> 6) For most of ODF we can deal with Unicode characters and strings of Unicode characters without discussing encodings.  For serialization we permit whatever XML permits and we don't need to deal with encoded characters.  However there are some exceptions that we need to be more explicit with.  One is passwords entered during encryption.  Since the encryption algorithms work at the bit level, both encoding and byte ordering need to be specified.
> 7) Any functions that deal with upper case/lower case conversions, such as in OpenFormula, need to make sure they are specified correctly with respect to Unicode.  
> 8) Anything else?
> Suggest search phrases are: character*, sort, search, collation, unicode, encod*, encrypt*, string (unless it is xsd:string), *space, dash, hyphen, 

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://tools.oasis-open.org/issues/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira