OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

xdi message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [xdi] Re: Further ABNF adjustment


Drummond,

You are correct, excluding a specific trivial case can actually force more complexity in rules. The old ABNF had some examples of this. This is one reason why allowing a bare literal as a segment seems more natural to me.

The xref rule with added initial colon might need more grouping brackets:  "(" [ [ ":" IRI ] / address ] ")"

I actually think allowing simply (http:// … ) with its own noninitial colon as an IRI xref would only add a little (finite) complexity to parsing, as opposed to some of the exponentially growing parse trees we may have been hitting in the past, and would look good for XDI's first-class support of IRIs - I posted a comment on this to https://wiki.oasis-open.org/xdi/XdiAbnf/Discussion earlier today.

Also noted the tel: and sms: schemes can have matched parentheses in their bodies, so if we allow these we may have to allow matched parentheses in IRIs, and do parenthetical depth counting as we parse IRIs, unless we require clients to escape and unescape all the internal parens. If we're scanning for parens, checking for the internal colon after the scheme is not much additional work.

Joseph

On Feb 3, 2013, at 2:33 PM, Drummond Reed <drummond@connect.me> wrote:

Joseph,

First, thanks very much for this analysis of the ABNF. I hadn't appreciated it in detail until I studied after Friday's telecon. Condensing it down to four lines is a FANTASTIC way of seeing the essence of the ABNF.

Based on our discussion on Friday's call, and if you follow the recommendations I posted to https://wiki.oasis-open.org/xdi/XdiAbnf/Discussion (namely, not allowing colons in literals, and using colons to prefix IRIs within cross-references), here's a revised version of your four-line ABNF if bare literals are allowed to begin segments:

OPTION #1: IF BARE LITERALS ARE ALLOWED

address = 1*subseg [ "/" 1*subseg [ "/" 1*subseg ] ] ;
subseg  = [ "=" / "@" / "+" / "$" ] [ "*" / "!" ] [ xref / literal ] ;
xref    = "(" [ ":" IRI / address ] ")";
literal = 1*[ iunreserved / pct-encoded ] ;

If bare literals are NOT allowed, as in the proposal we discussed on Friday, then I could only condense the ABNF into six rules

OPTION #2: IF BARE LITERALS ARE NOT ALLOWED

address = 1*subseg [ "/" 1*subseg [ "/" 1*subseg ] ]
subseg  = global / local / xref
global  = ( "=" / "@" / "+" / "$" ) [ "*" / "!" ] [ xref / literal ]
local   = ( "*" / "!" ) [ xref / literal ]
xref    = "(" [ ":" IRI / address / literal ] ")"
literal = 1*[ iunreserved / pct-encoded ]

Two questions:
  1. Am I missing something - do you see a way to compact it further?
  2. Will there be any real difference in efficiency of parsing between these two (given that Option #2 is actually narrower than Option #1 because it excludes bare literals)?
Thanks,

=Drummond  



On Fri, Feb 1, 2013 at 9:34 AM, Joseph Boyle <planetwork@josephboyle.net> wrote:
Markus, thanks for the recognition, glad to be able to help out.

Drummond, do we need to exclude bare literals as segments at the syntax level? It seems to me they may be semantically trivial, but are syntactically consistent.

Just experimenting with finding a minimal set of verification rules (for clarity, omitting naming all the productions we want as parsing results) if bare literals are allowed, the grammar can be as short as:

address = 1*subseg [ "/" 1*subseg [ "/" 1*subseg ] ] ;
subseg  = [ "=" / "@" / "+" / "$" ] [ "*" / "!" ] [ xref / literal ] ;
xref    = "(" [ IRI / address ] ")";
literal = 1*[ iunreserved / pct-encoded / "&" / ";" / "," / "'" / ":" ] ;


On Jan 31, 2013, at 11:30 PM, Drummond Reed <drummond@connect.me> wrote:

Markus, thanks, this is great work. I have reviewed this and am in agreement with the changes. 

The support for a literal as a standalone value at the start of a XDI segment has always been somewhat theoretical, i.e., we originally did it that way to not rule it out (because the preceeding slash could be a delimiter). But that does not work for the first segment of an XDI address.

So I agree that it's cleaner to just require all XDI segments to start with delimited subsegments. 

I'll add this to the agenda for tomorrow's telecon.

=Drummond 


On Thu, Jan 31, 2013 at 5:51 PM, Markus Sabadello <markus.sabadello@xdi.org> wrote:
Hello XDI TC,

Based on implementation experience and some discussions, I added another slightly changed version of the XDI ABNF to the discussion page of the relevant proposal:

Here's the summary from the page:
1. Some of the changes here are motivated by the insight that the purpose of an ABNF is not only to validate a string against a set of rules, but also to semantically understand the various components of that string.
2. The "xdi-inner-graph" rule is introduced, in order to have an explicit rule for this fundamental XDI construct. This change doesn't affect what is valid XDI and what is not.
3. The "xdi-context" rule is introduced, for the same reason.
4. The "xdi-segment" rule is changed to no longer permit a literal at the beginning. A segment that does not start with a context symbol, and is not a cross-reference, does not appear to be useful, and it might be ambiguous with regard to other rules.
5. The "xref-literal" rule is introduced, in order to still allow literals in cross-references.

I tested this ABNF in the XDI2 library, and it appears to work fine.

In fact, I have recently added to XDI2 support for a new parser library (APG), in addition to the one I had been using before (aParse).
After evaluating them both, my conclusion is that they are both able to handle the XDI ABNF, that they produce the same results, and that APG is about twice as fast as aParse.
So APG will now be standard in XDI2, but aParse is optionally also still supported.

I have spent quite some time thinking about Joseph Boyle's ideas about optimizing the parsing process in smart ways, for example by simply "skipping" from an opening "(" to a closing ")" in order to avoid having to descend deep into the IRI rules. This sounds quite good to me, I just haven't found a way to actually implement that yet in a way that still ensures robustness and correctness of the parsing process. I think it was also Joseph who early on suggested that XRI parsing might be one of the most resource-intensive tasks of an XDI server, and I think that is very right. So while switching to a faster parsing library is a great step, we'll keep looking for further optimizations.

You can use the following tool to experiment with the most recent ABNF proposal I mentioned above:

Markus


--
You received this message because you are subscribed to the Google Groups "XDI2" group.
To unsubscribe from this group and stop receiving emails from it, send an email to xdi2+unsubscribe@googlegroups.com.
To post to this group, send email to xdi2@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 






[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]