[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]
Subject: Re: [relax-ng] Re: RELAX NG resources update
> I haven't studied about this new library thoroughly, but I believe there > are differences between the syntax of XML Schema regex and > java.util.regex. There are a lot of differences, many of them quite subtle. The most fundamental difference is that java.util.regex deals with sequences of 16-bit code points, whereas XML Schema regex deals with characters. Another important difference is that JDK 1.4 is based on Unicode 3.0 whereas XML Schema regexes requires support for at least Unicode 3.1, which added a lot of new characters outside the BMP. So, for example, something as simple as \p{L} in XML Schema would be equivalent to: ([\p{L}\u03F5\u03F4]|[\uD840-\uD868][\uDC00-\uDFFF]|\uD800[\uDF00-\uDF1E\uDF 30-\uDF49]|\uD801[\uDC00-\uDC25\uDC28-\uDC4D]|\uD835[\uDC00-\uDC54\uDC56-\uD C9C\uDC9E-\uDC9F\uDCA2\uDCA5-\uDCA6\uDCA9-\uDCAC\uDCAE-\uDCB9\uDCBB\uDCBD-\u DCC0\uDCC2-\uDCC3\uDCC5-\uDD05\uDD07-\uDD0A\uDD0D-\uDD14\uDD16-\uDD1C\uDD1E- \uDD39\uDD3B-\uDD3E\uDD40-\uDD44\uDD46\uDD4A-\uDD50\uDD52-\uDEA3\uDEA8-\uDEC 0\uDEC2-\uDEDA\uDEDC-\uDEFA\uDEFC-\uDF14\uDF16-\uDF34\uDF36-\uDF4E\uDF50-\uD F6E\uDF70-\uDF88\uDF8A-\uDFA8\uDFAA-\uDFC2\uDFC4-\uDFC9]|\uD869[\uDC00-\uDED 6]|\uD87E[\uDC00-\uDE1D]) in the Java regex language. All in all, doing the translation is probably about 50% of the work of a complete regex implementation. Anyway, I've written code to do the translation and it will be in the next version of Jing. James
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]
Powered by eList eXpress LLC