OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

office message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: [OASIS Issue Tracker] Commented: (OFFICE-1895) RIGHTB and friendsis incompletely specified



    [ http://tools.oasis-open.org/issues/browse/OFFICE-1895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12048#action_12048 ] 

Dennis Hamilton commented on OFFICE-1895:
-----------------------------------------

Andreas, I'm betting that the definition is muddled and that the specification is fence-sitting.

The question is really whether or not the count parameter counts bytes or characters (in the pure character sense of the RIGHT function).  We then have the interesting problem about the ending position not being the ending of a character encoding in whatever the underlying representation is, let alone having the span of bytes have incomplete character encodings at either end.  There's no way to prevent having an unsafe result if there is a non-single-byte representtion involved.  Because unsafe character-sequence encodings can be constructed, functions and operations (such as display) that depend on safe character-sequence encodings  must provide defenses against unsafe strings.

My money is on the count being in bytes, not characters, with the result disguised as a string.  Whether it is valid/safe as a string in terms of the implementation's character set encoding is a different question.    

My experience is that these devices are used when spreadsheet functions and macros are employed to manipulate the underlying representation, or other data, as arrays of bytes, taking advantage of the historically-prevalent but entirely coincidental situation when a single byte is enough for the representatioin of a single character.   (Look at the weaseling of ASC in the OpenFormula draft.)

This is not too different from the situation in Java and .NET where strings may hold unsafe sequences of char, an integer type, and they are definitely not strings of Unicode code points when handled char-wise.

QUESTION:

1. Is it really desirable to have these functions part of the standard set of OpenFormula functions?

2. If it is indeed desirable as a practical matter, what is a good way to allow these to be used safely in interchange settings when the underlying character-set encoding is not fixed and the purpose for using the functions is dealing with data that is for an expected encoding disguised as (not exactly valid) strings?

3. I don't have a good answer, but I do think that a solution might require auxilliary functions by which a formula and any user-defined functions/macros/scripts can determine the number of octets, or of bits, in a byte and what the implementation-determined character-set encoding is.  (This reminds me of what's is needed to understand the numerical representations of a C/C++ implementation using limits.h and other parameters that reveal implementation-determined characteristics.)

I will now shut my mouth and wait to find out what the OpenFormula crew had in mind.  I suppose, in the meantime, we could look at IS 29500 to see how anything similar has been dealt with.

> RIGHTB and friends is incompletely specified
> --------------------------------------------
>
>                 Key: OFFICE-1895
>                 URL: http://tools.oasis-open.org/issues/browse/OFFICE-1895
>             Project: OASIS Open Document Format for Office Applications (OpenDocument) TC
>          Issue Type: Bug
>          Components: OpenFormula
>    Affects Versions: ODF 1.2
>            Reporter: Andreas Guelzow 
>
> 6.6.6 RIGHTB
> Summary: Return a selected number of text characters from the right, using byte position.
> This description fails to indicate what happens in the likely situation that the selected bytes do not form a complete character sequence. Should that be an error or move to a correct position?
> Similarly for  the otehr ...B functions

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://tools.oasis-open.org/issues/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]