RE: [xri] Patterns

Bill,

This is fascinating thinking. I didn't – and I don't think Dave or Gabe did – consider the potential for this use of the Pattern element.

Could you post (either as a doc or just in an email) the XML schema for XRD that you would propose to incorporate this functionality?

See also my reply about the regex.

=Drummond

From: Barnhill William [mailto:barnhill_william@bah.com]
Sent: Saturday, November 12, 2005 6:43 AM
To: Tan, William; xri@lists.oasis-open.org
Subject: RE: [xri] Patterns

Some other thoughts as well:

In working on db tables to store XRDs I've found that there's a lot of redundancy if we treat two service elements that differ only in patterns as different services. In my db schema I'm using 3 tables:
..XRDs: has all the XRD elements/attrs except for service, plus an ID
..ServicesDefs: has all the Service elements/attrs except for pattern, plus an ID
..Service: Has a foreign key XrdID, a foreign key ServiceDefID, and a pattern

I'd like to get thoughts on refactoring Service from
Service = current els and attrs + pattern
to
Service = ServiceDef + pattern
and ServiceDef being one of
(a) ServiceDef = src attribute containing URL that matches an extern ServiceDef defined elsewhere within the XRDS or XRD (if XRDS then it would have to be included in XRD when delivered to client)
(b) ServiceDef = current els and attributes of Service, minus pattern

I'd also suggest adding a type attribute onto the pattern element. This attribute would have a value from a specified list, one possible list being Simple, DFA, TradNFA, PosixNFA. (See http://www.oreilly.com/catalog/regex/chapter/ch04.html for more info on the regex engine types).

OTH, those type names have the benefit of being implementation independent, but that also makes them less human readable. It might be better to decide on a set that are more recognizable (PerlCompat might be one such type) and representative of the different regex types in the wild.

The Simple pattern type is a system I am currently using that goes by the following rules:
1. Designate the entire pattern as the literal part
1. If the literal part starts with ^ and is not followed by a ^ then the part of the pattern that follows is considered the literal part and this pattern will match any path that starts with the literal part, subject to the rules regarding $ metacharacter.
2. If the literal part ends with $ and is not preceded by a $ then the literal part becomes everything in the literal part that preceded the $, and this pattern will match any path that ends with the literal part.
3. If the pattern consist of a ^ metacharacter part, a literal part, and a $ metacharacter part then the pattern will match only paths that exactly are the literal part of the pattern.

This meets all of my current needs (grouping would be nice, but I think it should be handled by the service, not in the resolver), and avoids the ugly escaping needed for PERLCompat expressions that match against a path that contains one or more of {$ versions, () xrefs, * subsegments}.

Sorry I didn't post this sooner, but just noticed it while working on the rails resoolver db.

Thanks,

Bill Barnhill
i-names: =Bill.Barnhill, @communitivity*Bill.Barnhill
Don't know what an i-name is? Find out <a href=""http://2idi.com/grs/index.php?referral_code=communitivity">here</a>
Don't have an i-name? Get one <a href=""http://2idi.com/grs/index.php?referral_code=communitivity">here</a>

-----Original Message-----
From: Tan, William [mailto:William.Tan@neustar.biz]
Sent: Sat 11/12/2005 5:51 AM
To: xri@lists.oasis-open.org
Subject: [xri] Patterns

One more thing regarding patterns, we should probably state that the
local part should first be canonicalized according to the rules in the
syntax specs, i.e. by removing extra /./ and /../, and removing
unnecessary percent encoding.

wil.

________________________________

From: Tan, William [mailto:William.Tan@neustar.biz]
Sent: Saturday, November 12, 2005 9:00 PM
To: Drummond Reed; xri@lists.oasis-open.org
Subject: RE: [xri] XRI Resolution 2.0 Draft 09 comments

1. xrd:XRD/xrd:Service/xrd:Pattern - what flavor of regular expression
should the value be (perl-compatible, posix, etc.)? Is the full power of
regexp really required, why not just simple string comparison or prefix
matching?

### Great question. Dave was the original proposer of this feature -
I'll let him answer (others, please chime in on this.) I know you're
working on very high-volume HTTP proxy resolvers - what's your view of
the best tradeoff between comparison functionality and performance? ###

[Wil] I have 3 points of concern:

a) Performance-wise, because the number of regular expressions to
compile equals the number of Pattern tags (as opposed to a single
pattern matched against multiple candidates), it may be expensive for
proxy resolvers. However, I don't have concrete statistics to prove my
point.

b) If a regexp is valid but contains a logic error, there is no way for
the registry to verify.

c) Standard - which flavor of regular expression to use? Various regular
expression libraries support different options. If we support regular
expressions, users might ask: how to specify case insensitive match? How
to do negation? Are Unicode character properties (\p & \P) supported?

xri message