[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]
Subject: [relax-ng] John Cowan's regex for RNG proposal, version 0.3
This is John Cowan's proposal 0.3 for regular expressions in RELAX NG. It introduces one new pattern type, regex, which contains child elements from the following set. A regex pattern matches the sequence of what its children match. Here are the possible kinds of child elements: 0. A regex element, either directly or via a ref element or an externalRef element (which allows sub-regexes to be defined as named patterns).: matches whatever its children match. This serves the function of the group element in ordinary pattern matching. 1. A choice element: matches whatever any of its children match. 2. A zeroOrMore element: matches zero or more instances of whatever its children match. 3. A oneOrMore element: matches one or more instances of whatever its children match. 4. An optional element: matches zero or one instance of whatever its children match. 5. An element that specifies n-to-m matching. The syntax of this depends on what, if anything, is added to RNG pattern matching. 6. A choice element: matches anything that any of its children match. 7. A submatch element (with mandatory id attribute): matches whatever its children match. 8. A submatchRef element (with mandatory id attribute): matches whatever the submatch element with corresponding id has matched (the submatch must come entirely before any corresponding submatchRef). 9. A caseFolded element: matches whatever its children match after all Unicode letters in both the children and the corresponding matched content has been case-folded. 10. An anchor element (with mandatory type argument, one of "bos", "eos", "bol", "eol"): matches nothing, but must be at the beginning of the string, the end of the string, the beginning of a line, or the end of a line. 11. A value element: matches the sequence of characters specified by its character content. 12. A collection element: matches a single character belonging to the specified collection. The children of a collection element specify the members of the collection, and can be any of the following: A. Another collection element: specifies what its children specify. B. A charSet element: specifies the characters which constitute its character content. C. A charRange element: specifies the characters which appear in Unicode order between the first and second character (inclusive), the third and fourth character (inclusive), and so on. D. A charsetName element (with mandatory name attribute): specifies the characters which occur in the named set (names to be thought up later). E. A choice element: specifies the union of its children. F. A concur element: specifies the intersection of its children. G. A difference element: specifies the non-symmetric difference of its children. H. A kernel element: specifies the union of its children and some characters unknown. I. A hull element: specifies the difference between its children and some characters unknown. J. An alt element: specifies the characters known by its children. Regex patterns may be used wherever data patterns may be used. Kernels, hulls, and alt are explained at http://www.w3.org/TR/charcol/ . -- John Cowan <jcowan@reutershealth.com> http://www.reutershealth.com http://www.ccil.org/~cowan .e'osai ko sarji la lojban. Please support Lojban! http://www.lojban.org
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]
Powered by eList eXpress LLC