OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

office-comment message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]

Subject: Re: [office-comment] Text in OpenFormula - inadequate for internationaluse

David A. Wheeler wrote:
> There are at least two encodings:
> * Encodings of text at rest (stored in a document), which are handled by XML
> * Encodings of text when stored in memory, inside an evaluator.  This may not be, and is often NOT, the same.  If an implementation uses 32 bits per character, then it can do easy random access.  I don't know of any implementations that do this, but instead do all sorts of weird stuff.
> It shouldn't matter what encoding(s) an evaluator uses as long as it can get the right answer.  But the encodings an evaluator supports determine whether they CAN get a right answer.
First, are we talking repertoires or encodings? The repertoire relates 
to the set (bag) of characters allowed. The encoding relates to the 
mappings to various kinds of numbers (codepoints, bytes). 

David is saying, if I get it right, that the repertoire of characters 
allowed at certain points in OpenFormula should be limited to the 
capabilities of the 7-bit codespace available for characters in certain 
engines allegedly.

This is perhaps the first time I have seen a standard that is not even 
finished (and therefore cannot have been implemented) be dragged down by 
legacy implementation issues :-)  I am sure many eyebrows are being raised.

One of the reasons non-Western countries participate in ISO is because 
it gives them a forum where their Internationalization concerns are put 
at center stage. This is an area where W3C is also very strong (because 
they have the I18n WG) but I think OASIS is not as strong, which is why 
ISO can have some constructive role to play in review.  It is now 15 
years since the game was over for IT standards that did not use Unicode: 
that anyone could in 2010 seriously consider a new standard which does 
not provide first-class Unicode support (if that is what is going on 
here) would be a travesty and a throwback. If this causes a problem for 
some engines, why isn't that a good thing?  They are substandard or not 
suitable for the global market.

*If* this is a real issue, one approach would be to make a conformance 
class "Not suitable for international sales."  Then national procurement 
people will have a good objective basis for refusing to purchase those 
products for being substandard.  I think it would be fiddly, and it 
would be simpler for vendors to simply say "we don't conform in this 
area yet."

[[Another approach would be to allow a repertoire indicator (for 
example, tied in to ISO CRDL) so that a system can arrange the correct 
mapping tables to convert from the Unicode characters to the limited 
encoding in use. This is assuming that the issue is not that these 
systems support only 7-bit ASCII but that they only support 8-bit 
encodings (in which case, why couldn't they use UTF-8?)]]

Another approach would be a non-normative note that says "NOTE: While 
OpenFormula allows any Unicode character at this point, some legacy 
spreadsheet engines developed for specific locales do not accept 
characters more than the customary characters of that locale: this issue 
has been reported for non-ASCII-repertoire characters."

David: can you name names: which software actually has this problem? 

Rick Jellife

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]