OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

relax-ng message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]


Subject: Re: [relax-ng] Re: RELAX NG resources update



> I haven't studied about this new library thoroughly, but I believe there
> are differences between the syntax of XML Schema regex and
> java.util.regex.

There are a lot of differences, many of them quite subtle.  The most
fundamental difference is that java.util.regex deals with sequences of
16-bit code points, whereas XML Schema regex deals with characters.  Another
important difference is that JDK 1.4 is based on Unicode 3.0 whereas XML
Schema regexes requires support for at least Unicode 3.1, which added a lot
of new characters outside the BMP.

So, for example, something as simple as \p{L} in XML Schema would be
equivalent to:

([\p{L}\u03F5\u03F4]|[\uD840-\uD868][\uDC00-\uDFFF]|\uD800[\uDF00-\uDF1E\uDF
30-\uDF49]|\uD801[\uDC00-\uDC25\uDC28-\uDC4D]|\uD835[\uDC00-\uDC54\uDC56-\uD
C9C\uDC9E-\uDC9F\uDCA2\uDCA5-\uDCA6\uDCA9-\uDCAC\uDCAE-\uDCB9\uDCBB\uDCBD-\u
DCC0\uDCC2-\uDCC3\uDCC5-\uDD05\uDD07-\uDD0A\uDD0D-\uDD14\uDD16-\uDD1C\uDD1E-
\uDD39\uDD3B-\uDD3E\uDD40-\uDD44\uDD46\uDD4A-\uDD50\uDD52-\uDEA3\uDEA8-\uDEC
0\uDEC2-\uDEDA\uDEDC-\uDEFA\uDEFC-\uDF14\uDF16-\uDF34\uDF36-\uDF4E\uDF50-\uD
F6E\uDF70-\uDF88\uDF8A-\uDFA8\uDFAA-\uDFC2\uDFC4-\uDFC9]|\uD869[\uDC00-\uDED
6]|\uD87E[\uDC00-\uDE1D])

in the Java regex language.

All in all, doing the translation is probably about 50% of the work of a
complete regex implementation.

Anyway, I've written code to do the translation and it will be in the next
version of Jing.

James








[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]


Powered by eList eXpress LLC