Bill,
This is fascinating thinking. I didn't –
and I don't think Dave or Gabe did – consider the potential for this use
of the Pattern element.
Could you post (either as a doc or just in
an email) the XML schema for XRD that you would propose to incorporate this
functionality?
See also my reply about the regex.
=Drummond
From: Barnhill William
[mailto:barnhill_william@bah.com]
Sent: Saturday, November 12, 2005
6:43 AM
To: Tan, William;
xri@lists.oasis-open.org
Subject: RE: [xri] Patterns
Some other thoughts as well:
In working on db tables to store XRDs I've found that there's a lot of
redundancy if we treat two service elements that differ only in patterns as
different services. In my db schema I'm using 3 tables:
..XRDs: has all the XRD elements/attrs except for service, plus an ID
..ServicesDefs: has all the Service elements/attrs except for pattern, plus an
ID
..Service: Has a foreign key XrdID, a foreign key ServiceDefID, and a pattern
I'd like to get thoughts on refactoring Service from
Service = current els and attrs + pattern
to
Service = ServiceDef + pattern
and ServiceDef being one of
(a) ServiceDef = src attribute containing URL that matches an extern ServiceDef
defined elsewhere within the XRDS or XRD (if XRDS then it would have to be
included in XRD when delivered to client)
(b) ServiceDef = current els and attributes of Service, minus pattern
I'd also suggest adding a type attribute onto the pattern element. This
attribute would have a value from a specified list, one possible list being
Simple, DFA, TradNFA, PosixNFA. (See http://www.oreilly.com/catalog/regex/chapter/ch04.html
for more info on the regex engine types).
OTH, those type names have the benefit of being implementation independent, but
that also makes them less human readable. It might be better to decide on a set
that are more recognizable (PerlCompat might be one such type) and
representative of the different regex types in the wild.
The Simple pattern type is a system I am currently using that goes by the
following rules:
1. Designate the entire pattern as the literal part
1. If the literal part starts with ^ and is not followed by a ^ then the part
of the pattern that follows is considered the literal part and this pattern
will match any path that starts with the literal part, subject to the rules
regarding $ metacharacter.
2. If the literal part ends with $ and is not preceded by a $ then the literal
part becomes everything in the literal part that preceded the $, and this
pattern will match any path that ends with the literal part.
3. If the pattern consist of a ^ metacharacter part, a literal part, and a $
metacharacter part then the pattern will match only paths that exactly are the
literal part of the pattern.
This meets all of my current needs (grouping would be nice, but I think it
should be handled by the service, not in the resolver), and avoids the ugly
escaping needed for PERLCompat expressions that match against a path that contains
one or more of {$ versions, () xrefs, * subsegments}.
Sorry I didn't post this sooner, but just noticed it while working on the rails
resoolver db.
Thanks,
Bill Barnhill
i-names: =Bill.Barnhill, @communitivity*Bill.Barnhill
Don't know what an i-name is? Find out <a href=""http://2idi.com/grs/index.php?referral_code=communitivity">here</a>
Don't have an i-name? Get one <a href=""http://2idi.com/grs/index.php?referral_code=communitivity">here</a>
-----Original Message-----
From: Tan, William [mailto:William.Tan@neustar.biz]
Sent: Sat 11/12/2005 5:51 AM
To: xri@lists.oasis-open.org
Subject: [xri] Patterns
One more thing regarding patterns, we should probably state that the
local part should first be canonicalized according to the rules in the
syntax specs, i.e. by removing extra /./ and /../, and removing
unnecessary percent encoding.
wil.
________________________________
From: Tan, William [mailto:William.Tan@neustar.biz]
Sent: Saturday, November 12, 2005 9:00 PM
To: Drummond Reed;
xri@lists.oasis-open.org
Subject: RE: [xri] XRI Resolution 2.0 Draft 09 comments
1. xrd:XRD/xrd:Service/xrd:Pattern - what flavor of regular expression
should the value be (perl-compatible, posix, etc.)? Is the full power of
regexp really required, why not just simple string comparison or prefix
matching?
### Great question. Dave was the original proposer of this feature -
I'll let him answer (others, please chime in on this.) I know you're
working on very high-volume HTTP proxy resolvers - what's your view of
the best tradeoff between comparison functionality and performance? ###
[Wil] I have 3 points of concern:
a) Performance-wise, because the number of regular expressions to
compile equals the number of Pattern tags (as opposed to a single
pattern matched against multiple candidates), it may be expensive for
proxy resolvers. However, I don't have concrete statistics to prove my
point.
b) If a regexp is valid but contains a logic error, there is no way for
the registry to verify.
c) Standard - which flavor of regular expression to use? Various regular
expression libraries support different options. If we support regular
expressions, users might ask: how to specify case insensitive match? How
to do negation? Are Unicode character properties (\p & \P) supported?