Subject: Re: [office-formula] Summary of 2009-10-13 teleconference
Hi, On Tuesday, 2009-10-13 11:39:08 -0400, David A. Wheeler wrote: > * The "Text" (String) type will simply defined as something that can contain 0 or more "characters". We will separately discuss what "character" means. I personally think we should recommend, but not require, that implementations support a character set and encoding that permit any legal Unicode code point be a character, but we didn't discuss that today. I think we should say that implementations - shall support Unicode BMP and - should support the entire Unicode character range. This comforts those that internally use UCS2 encoding only, though I don't know if there are implementations limited such. I don't think there would be implementations of OpenFormula that do not support Unicode BMP. > * We will need to add a discussion noting that implementations may have a specific character set and character encoding as a setting, and that this may limit which characters may be included in strings. I don't see how that would affect the specification other than we could note that some implementations are limited and thus results may differ if an ODF/OpenFormula document is read by such. I don't think that is the responsibility of the specification though. > Do we need to have a way to STORE this information in an OpenDocument file? If so, how? I don't think so. Strings are stored in the encoding given by <?xml encoding="..."> I don't see much benefit in storing the internal encoding of the generating implementation, other than readers would be required to convert from their internal Unicode encoding to that other encoding for functions such as CODE() and CHAR(). Doing so would impose a bunch of otherwise unnecessary conversion routines on implementations, maybe even including encodings not registered with IANA. However, we define those functions to ASCII for values 1<=N<=127 and to be implementation-dependent for values 128<=N<255 anyway. > * CHAR() and CODE() are *not* deprecated... they stay in. Instead, they "normalize" to ASCII values, regardless of the internal representation. I believe this only affects those who use a particular Arabic encoding that uses 0...127 for non-ASCII characters, and only when using those functions. We still define 128..255 to be implementaion-dependent, yes? Eike -- Automatic string conversions considered dangerous. They are the GOTO statements of spreadsheets. --Robert Weir on the OpenDocument formula subcommittee's list.