RE: [xri] Proposed autocorrection rule for HXRIs

1) That’s fine. The disclaimer is that this autocorrect routine will and should not be smart about cross references. So if you had “=drummond/+rss*(http://del.icio.us/drummond/)” it will do the wrong thing by turning it into “=drummond/(+rss*(http:)//del.icio.us/drummond/)”

2) Also, I feel that we should not have to take care of second level paths and below. Again, to do that properly would require a portion of the parser functionality duplicated here.

3) I also don’t see the need to care about single or double bangs. If a single bang was specified, the autocorrection routine will just surround it with parentheses and throw it at the parser. If the parser chokes, it gets thrown out. I know it is a simple test but this algorithm should be as barebones as possible.

=wil (http://xri.net/=wil)

From: Drummond Reed [mailto:drummond.reed@cordance.net]
Sent: Thursday, June 01, 2006 4:24 PM
To: Tan, William; 'Wachob, Gabe'; xri@lists.oasis-open.org
Subject: RE: [xri] Proposed autocorrection rule for HXRIs

Sorry to be late on this thread; been offline all day.

I don’t think it has to be complicated, it’s meant only to address the very simple case of a human not typing parens around a segment that needs to be an xref.

Wil, to answer your questions:

1) To keep it simple, the rule is *per segment* in the QXRI path that begins with a GCS character. So:

http://xri.net/=drummond.reed/+contact/home would become http://xri.net/=drummond.reed/(+contact)/home, and

http://xri.net/=drummond.reed/+contact/+home would become http://xri.net/=drummond.reed/(+contact)/(+home).

2) The reason for double ! is that single ! is a legal XRI char. So:

http://xri.net/=!1000.2000.3000.4000/!1234/5678 must not change, but

http://xri.net/=!1000.2000.3000.4000/!!1234/5678 woudl become http://xri.net/=!1000.2000.3000.4000/(!!1234)/5678

I think that’s all the rule needs to be, and again due to human usability issues, I feel we should specify this in the spec and not leave it to implementations.

=Drummond

From: Tan, William [mailto:William.Tan@neustar.biz]
Sent: Wednesday, May 31, 2006 10:09 PM
To: Wachob, Gabe; Drummond Reed; xri@lists.oasis-open.org
Subject: RE: [xri] Proposed autocorrection rule for HXRIs

I feel that this is not so much the complication but that it seems hack-ish. I feel that anything hack-ish should belong in implementation, which is why I haven’t put it in the parser module. On the other hand, if the hack becomes de-facto we want to make sure that other implementations can interoperate, hence the proposal to put it in the spec.

=wil (http://xri.net/=wil)

From: Wachob, Gabe [mailto:gwachob@visa.com]
Sent: Thursday, June 01, 2006 6:11 AM
To: Tan, William; Drummond Reed; xri@lists.oasis-open.org
Subject: RE: [xri] Proposed autocorrection rule for HXRIs

This is getting complicated.

Are there *any* use cases for this besides =Name/(+contact)?

-Gabe

From: Tan, William [mailto:William.Tan@neustar.biz]
Sent: Wednesday, May 31, 2006 12:14 PM
To: Drummond Reed; xri@lists.oasis-open.org
Subject: RE: [xri] Proposed autocorrection rule for HXRIs

The proposed rules seem a little awkward to me.

Firstly, it shouldn’t have to worry about whether the PGCS is “double”. It just tests to see if the character is a GCS character (=@+$!).

Secondly, it shouldn’t have to match parentheses. So, if only one of the parentheses is missing, tough luck.

It also doesn’t explain around which part of the path should the parentheses be applied. If we have “+contact/email” should we make it “(+contact)/email” or “(+contact/email)”?

I would prefer the former, i.e. close the parenthesis at the earliest ‘/’ if found. Otherwise, place it at the end. The latter case means that we can no longer break it down to path segments and sub-segments since a cross reference is opaque.

What do you think?

=wil (http://xri.net/=wil)

From: Drummond Reed [mailto:drummond.reed@cordance.net]
Sent: Saturday, May 27, 2006 5:11 PM
To: xri@lists.oasis-open.org
Subject: [xri] Proposed autocorrection rule for HXRIs

Wil has raised a key question regarding the path matching rule stated on line 1381 of XRI Resolution 2.0 Working Draft 10 (http://www.oasis-open.org/committees/download.php/17293/xri-resolution-v2.0-wd-10.pdf). This rule currently reads:

IMPORTANT: If there is no match, this comparison MUST be repeated after enclosing the value of the Path String parameter in parentheses (“(“ and “)”). This eliminates the need for XRD authors to specify multiple xrd:Path elements in order to match an XRI path that may or may not be expressed as a cross-reference.

The purpose of this rule was to enable humans to type a simple HXRI such as…

            xri.net/=person/+contact

…into a browser address bar and have it resolved via proxy resolution to a Web page for contacting the identified person. Technically this is not a legal HXRI, i.e., to be syntactically correct, it needs to be…

            xri.net/=person/(+contact)

The parentheses are required around “+contact” because it is an absolute XRI by itself, and therefore if used anywhere except at the start of the authority segment of the XRI, it must be expressed as a cross-reference, i.e., enclosed in paratheses. The same is true for any XRI or URI embedded in an XRI, e.g.:

            xri.net/=person/(mailto:john.doe@example.com)



However, because it is unrealistic to expect human users to understand/remember/type XRI cross-reference syntax (even many developers don’t like typing it), the proposed path matching rule above would instruct the resolver to match either the path “+contact” or “(+contact)” to the following Path element value:

            <Path>(+contact)</Path>

In other words, if a human typed “xri.net/=person/+contact”, the resolver would first try to match the path “+contact” and find no match, but then it would try to match the path “(+contact)” and get a match.

What Wil pointed out is that an XRI parser would actually reject “xri.net/=person/+contact” as being syntactically invalid, so a strict implementation that parsed the XRI to determine the path component would error out BEFORE it ever got to path comparison. So Wil suggested that if we are going to compensate for human lack of understanding and do “autocorrection” of “xri.net/=person/+contact” to “xri.net/=person/(+contact), we should not do it via modifying the path comparision rules, but publish a set of rules for such autocorrection that: a) apply to all QXRIs, b) are not ambiguous, c) do not introduce potential security flaws, and d) are applied prior to formal XRI parsing (for obvious reasons).

I agree with Wil’s assessment, so I propose the following autocorrection rule for HXRIs:

AUTO-CORRECTION RULE: If any segment of the path portion of the QXRI embedded in an HXRI begins with either: a) a reassignable GCS character (=, @, +, $), or b) with a double persistent GCS character (!!), then to be a syntactically compliant XRI that segment MUST be enclosed in matching parentheses. A compliant proxy resolver MUST automatically turn such an QXRI into a valid XRI by adding these parentheses if either or both are missing. A compliant local resolver SHOULD also perform this same autocorrection of XRIs.

Note that if we add this rule in Working Draft 11, we can delete the path comparison rule currently defined starting on line 1381, because autocorrection will have already been applied.

This is a important change for Working Draft 11, so please post any feedback if you disagree.

=Drummond

xri message