[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: Re: [office-formula] Calculation setting for regular expressionlanguage?
Hi David, Following-up on an older topic: On Saturday, 2006-08-05 23:08:02 -0400, David A. Wheeler wrote: > I suggest that we create a new calculation setting > to control which regular expression language to use - > at least for the database criteria, and probably for SEARCH > as well. That will allow documents from various locations > to move into OpenFormula, AND gives more flexibility. > Anyone care to comment, discuss? I think we should define one regex language instead, at least POSIX EREs, maybe PCREs, with the addition of Unicode handling, everything else doesn't make much sense from an i18n point of view. This effectively boils down to a language like used by the ICU as described in http://icu.sourceforge.net/userguide/regexp.html Allowing multiple regex languages may seem to give more flexibility, but IMHO just adds to confusion. Most spreadsheet applications would only implement one regex language anyway, thus exchanging documents using a different language between applications would be very limited. Additionally to the table:use-regular-expressions caclulation setting, ODF should be enhanced to include another setting, table:use-wildcards or similar, to allow calculcations using the MS-Excel wildcards, asterisk '*', question mark '?' and the tilde '~' escape character. The two settings table:use-regular-expressions and table:use-wildcards would be mutually exclusive. > OOo's is much more capable; below is its language per its help file. > In fact, OOo's looks a whole lot like the POSIX standard's > RE language. If it is, we probably ought to call it "POSIX" > (as claimed above). But I'm not SURE it is; I'd love to > hear confirm/deny of it. Should we call it POSIX? OOo? The current implementation of OOo is mostly POSIX, though not strictly, as you have noted: > A quick comparison of OOo to the standard suggests that OOo > _is_ the POSIX Extended RE set, except: > * "." in POSIX matches any char; in OOo it > "Represents any single character except for a > line break or paragraph break." > * "\>", "\<" in OOo Matches end/beginning of word. Not in the spec, > this is an extension. > * "\xXXXX" in OOo "Represents a special character based on its > four-digit hexadecimal code (XXXX)." Not in the spec. > We could just document the extensions. I would like to call these "temporary flaws" instead.. they were invented without having any standard in mind (well, \< \> probably being derived from sed's syntax), just to be compatible with some ancient regex engine used by former legacy versions of StarDivision's StarOffice. I would refrain from nailing these down in an ODF standard. > The different meaning > of "." is more bothersome; if that's really important, maybe > it shouldn't be called POSIX, but something else. This is mainly to be seen in the context of the Writer textprocessor application, where a paragraph is actually not delimited by a newline or any other character, so using a '.' will not find it. > What should we do about this detail? Are there other > differences I haven't noticed? Not to my knowledge. However, it is most likely that future versions of OOo will switch to the ICU regular expressions. ICU regex Unicode properties follow those defined in the Unicode Regular Expressions, so if we wanted to include a reference we maybe should point to UTS #18, http://www.unicode.org/unicode/reports/tr18/ Note that the latter does not define a concrete syntax and uses Perl notation for its examples. Also the ICU syntax is based on Perl, as is the syntax of the Java package java.util.regex, both could be valid pointers as well. > I'd love to be able to reference another standard directly. AFAIK there is no standard that includes Unicode _and_ defines a proper syntax. Eike -- Automatic string conversions considered dangerous. They are the GOTO statements of spreadsheets. --Robert Weir on the OpenDocument formula subcommitee's list.
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]