OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

relax-ng message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]


Subject: [relax-ng] John Cowan's regex for RNG proposal, version 0.3


This is John Cowan's proposal 0.3 for regular expressions in RELAX NG.
It introduces one new pattern type, regex, which contains child elements
from the following set.  A regex pattern matches the sequence of what
its children match.  Here are the possible kinds of child elements:

0.  A regex element, either directly or via a ref element or an
externalRef element (which allows sub-regexes to be defined as named patterns).:
matches whatever its children match.  This serves the function of the
group element in ordinary pattern matching.

1.  A choice element: matches whatever any of its children match.

2.  A zeroOrMore element: matches zero or more instances of whatever its
children match.

3.  A oneOrMore element: matches one or more instances of whatever its
children match.

4.  An optional element: matches zero or one instance of whatever its
children match.

5.  An element that specifies n-to-m matching.  The syntax of this
depends on what, if anything, is added to RNG pattern matching.

6.  A choice element: matches anything that any of its children match.

7.  A submatch element (with mandatory id attribute): matches whatever
its children match.

8.  A submatchRef element (with mandatory id attribute): matches whatever
the submatch element with corresponding id has matched (the submatch must
come entirely before any corresponding submatchRef).

9.  A caseFolded element: matches whatever its children match after
all Unicode letters in both the children and the corresponding matched
content has been case-folded.

10.  An anchor element (with mandatory type argument, one of "bos", "eos",
"bol", "eol"): matches nothing, but must be at the beginning of the string,
the end of the string, the beginning of a line, or the end of a line.

11.  A value element: matches the sequence of characters specified by its
character content.

12.  A collection element: matches a single character belonging to the
specified collection.  The children of a collection element specify
the members of the collection, and can be any of the following:

	A.  Another collection element: specifies what its children specify.

	B.  A charSet element: specifies the characters which constitute
	its character content.

	C.  A charRange element: specifies the characters which appear in
	Unicode order between the first and second character (inclusive),
	the third and fourth character (inclusive), and so on.

	D.  A charsetName element (with mandatory name attribute):
	specifies the characters which occur in the named set
	(names to be thought up later).

	E.  A choice element: specifies the union of its children.

	F.  A concur element: specifies the intersection of its children.

	G.  A difference element: specifies the non-symmetric difference
	of its children.

	H.  A kernel element: specifies the union of its children and
	some characters unknown.

	I.  A hull element: specifies the difference between its children
	and some characters unknown.

	J.  An alt element: specifies the characters known by its children.

Regex patterns may be used wherever data patterns may be used.
Kernels, hulls, and alt are explained at http://www.w3.org/TR/charcol/ .

-- 
John Cowan                              <jcowan@reutershealth.com>
http://www.reutershealth.com            http://www.ccil.org/~cowan
                .e'osai ko sarji la lojban.
                Please support Lojban!          http://www.lojban.org


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]


Powered by eList eXpress LLC