[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: Re: [office-formula] Question on character classes
Hi Patrick, On Monday, 2009-12-21 08:42:19 -0500, Patrick Durusau wrote: > Eike Rathke wrote: >> On Saturday, 2009-12-19 11:34:34 -0500, Patrick Durusau wrote: >> >>> Whirring away on untangling the portability portions but curious >>> about the use of XML character classes. >>> >>> Makes sense if we are talking about XML text, makes less sense if we >>> are talking about strings in general. Just use Unicode character >>> classes with the correct references. >>> >>> Was there some overt reason for choosing XML character classes? Reason was they were already defined in a standard. >> What exactly are you talking about? Is this about LetterXML, DigitXML, >> ... in the syntax definitions, referring XML10 >> http://www.w3.org/TR/REC-xml/ appendix B Character Classes? >> >> > Yes. >> What would be the equivalent Unicode classes? >> >> > Well, for DigitXML that would be Decimal digits in the Unicode Character > Database. Unicode category 'Nd', yes. > I haven't compared the latest DigitXML to Decimal digits in the most > recent UCD so there may not be a significant difference. > > My main concern was if we are talking about supporting the full range of > characters in Unicode (or at least providing implementations with that > option) when rather than citing the XML standard and using Unicode by > indirection, we could also simply cite the Unicode standard. Fine with me in general. Thinking about it may even be more accurate. I'm not sure if, for example, LetterXML encompasses all letter characters that may be allowed in an identifier. We'd have to define the classes we use. It should be sufficient to do this in Unicode categories (Nd, Ll, Lu, ...) and not list the entire sets of ranges as it is done in XML10. If for example we could say Identifier ::= NameStartCharacter NameCharacter* NameStartCharacter ::= (Unicode characters of categories Ll, Lu, Lo, Lt, Nl) | '_' NameCharacter ::= NameStartCharacter | (Unicode characters of categories Mc, Me, Mn, Lm, or Nd) | '.' that would ease a lot. Is this possible? Note that the example does include compatibility characters, which are not part of LetterXML, but have to be added to OpenFormula. Seems I need to create yet another issue.. Eike -- Automatic string conversions considered dangerous. They are the GOTO statements of spreadsheets. --Robert Weir on the OpenDocument formula subcommittee's list.
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]