RE: [xri] Patterns

Gabe,

Thanks, and good point on favoring simplicity. I think the biggest benefit from the doc references use is in maintenance: changing the details on a service that exists in each of several thousand XRD instances as opposed to changing the details in one doc instance referenced by several thousand XRDs. I definitely think it can be done either way, and am not wedded to the doc ref idea though I like the conceptual idea of one thing (the service) having one doc instance representation that is referenced in many places, rather than one thing having many representations in many places. The second feels too much like "many things", rather than "one thing". Sorry, I know I'm not making that very clear.

Posix regex will work, and fill all our needs, though from an implementation standpoint it would be nice if all the overhead of a full-blown regex engine was optional rather than required, while still allowing pattern use. The schema states patterns are optional elements, but I believe the way it's stated means that they are optional for the XRD implementer, not for an XRD-consumer to understand, if they are there. My idea was basically to have the default patterns be much less overhead while still getting the 60-80% of users, and have an optional (on both sides of sw fence) capability for something like a posix regex engine as a (possible negotiated) extension feature.

It's not that the implementation is hard, we'd all most likely use regex libs for that, it's that the complexity of posix regular expressions and the escaping required for non-trivial XRIs seems such that it'd be nice to have something much simpler available for when we just want to capture one of 'starts with', 'ends with', or 'is exactly' pattern behaviors, which I think would handle 60%-80% of the cases. OTH, if we did posix as the default then we could leverage grouping within the resolution in some way, such as saying the groups would be passed as parameters to the service. However, whether the benefit of this outweighs the implementation complexity of such a grouping use is TBD.

Bill-

Great feedback.

I'd prefer *not* to do what you are suggesting and accept the extra cost of some verbosity for the benefit of simplicity.

What you are suggesting is a lot like what WSDL does - using same (or other) document references to "objects" in the database so they can be reused. A "wsdl:binding" element is an example. While I understand the motivation for doing this, I think it makes the implementation on the parsing side slightly more complicated (only slightly though, I'll grant you).

Personally, as a matter of style, I like having the description of a thing encapsulated by a single XML thing (ie element), at least where reuse is not as common. This change of descriptive style is rather significant, so I'd be a little leery of doing this at this point unless there is significant interest from other parties here.

As for regex syntax, I mentioned in an earlier message that the NAPTR/DDDS guys settled on posix regex (see RFC 3402/3403 for more details). I think we should do what they did and stick to one regex syntax (if that!).

-Gabe

From: Barnhill William [mailto:barnhill_william@bah.com]
Sent: Saturday, November 12, 2005 6:43 AM
To: Tan, William; xri@lists.oasis-open.org
Subject: RE: [xri] Patterns

Some other thoughts as well:

In working on db tables to store XRDs I've found that there's a lot of redundancy if we treat two service elements that differ only in patterns as different services. In my db schema I'm using 3 tables:
..XRDs: has all the XRD elements/attrs except for service, plus an ID
..ServicesDefs: has all the Service elements/attrs except for pattern, plus an ID
..Service: Has a foreign key XrdID, a foreign key ServiceDefID, and a pattern

I'd like to get thoughts on refactoring Service from
Service = current els and attrs + pattern
to
Service = ServiceDef + pattern
and ServiceDef being one of
(a) ServiceDef = src attribute containing URL that matches an extern ServiceDef defined elsewhere within the XRDS or XRD (if XRDS then it would have to be included in XRD when delivered to client)
(b) ServiceDef = current els and attributes of Service, minus pattern

I'd also suggest adding a type attribute onto the pattern element. This attribute would have a value from a specified list, one possible list being Simple, DFA, TradNFA, PosixNFA. (See http://www.oreilly.com/catalog/regex/chapter/ch04.html for more info on the regex engine types).

OTH, those type names have the benefit of being implementation independent, but that also makes them less human readable. It might be better to decide on a set that are more recognizable (PerlCompat might be one such type) and representative of the different regex types in the wild.

The Simple pattern type is a system I am currently using that goes by the following rules:
1. Designate the entire pattern as the literal part
1. If the literal part starts with ^ and is not followed by a ^ then the part of the pattern that follows is considered the literal part and this pattern will match any path that starts with the literal part, subject to the rules regarding $ metacharacter.
2. If the literal part ends with $ and is not preceded by a $ then the literal part becomes everything in the literal part that preceded the $, and this pattern will match any path that ends with the literal part.
3. If the pattern consist of a ^ metacharacter part, a literal part, and a $ metacharacter part then the pattern will match only paths that exactly are the literal part of the pattern.

This meets all of my current needs (grouping would be nice, but I think it should be handled by the service, not in the resolver), and avoids the ugly escaping needed for PERLCompat expressions that match against a path that contains one or more of {$ versions, () xrefs, * subsegments}.

Sorry I didn't post this sooner, but just noticed it while working on the rails resoolver db.

Thanks,

Bill Barnhill
i-names: =Bill.Barnhill, @communitivity*Bill.Barnhill
Don't know what an i-name is? Find out <a href="http://2idi.com/grs/index.php?referral_code=communitivity">here</a>
Don't have an i-name? Get one <a href="http://2idi.com/grs/index.php?referral_code=communitivity">here</a>

-----Original Message-----
From: Tan, William [mailto:William.Tan@neustar.biz]
Sent: Sat 11/12/2005 5:51 AM
To: xri@lists.oasis-open.org
Subject: [xri] Patterns

One more thing regarding patterns, we should probably state that the
local part should first be canonicalized according to the rules in the
syntax specs, i.e. by removing extra /./ and /../, and removing
unnecessary percent encoding.

wil.

________________________________

From: Tan, William [mailto:William.Tan@neustar.biz]
Sent: Saturday, November 12, 2005 9:00 PM
To: Drummond Reed; xri@lists.oasis-open.org
Subject: RE: [xri] XRI Resolution 2.0 Draft 09 comments

1. xrd:XRD/xrd:Service/xrd:Pattern - what flavor of regular expression
should the value be (perl-compatible, posix, etc.)? Is the full power of
regexp really required, why not just simple string comparison or prefix
matching?

### Great question. Dave was the original proposer of this feature -
I'll let him answer (others, please chime in on this.) I know you're
working on very high-volume HTTP proxy resolvers - what's your view of
the best tradeoff between comparison functionality and performance? ###

[Wil] I have 3 points of concern:

a) Performance-wise, because the number of regular expressions to
compile equals the number of Pattern tags (as opposed to a single
pattern matched against multiple candidates), it may be expensive for
proxy resolvers. However, I don't have concrete statistics to prove my
point.

b) If a regexp is valid but contains a logic error, there is no way for
the registry to verify.

c) Standard - which flavor of regular expression to use? Various regular
expression libraries support different options. If we support regular
expressions, users might ask: how to specify case insensitive match? How
to do negation? Are Unicode character properties (\p & \P) supported?

xri message