The EBNF input was: address ::= subseg+ ('/' subseg+ ('/' subseg+)?)? subseg ::= [=@+$] [*!] (xref | literal) xref ::= '(' (IRI | address) ')' literal ::= (iunreserved | pct-encoded | [&;,':])+
Drummond,
You are correct, excluding a specific trivial case can actually force more complexity in rules. The old ABNF had some examples of this. This is one reason why allowing a bare literal as a segment seems more natural to me.
The xref rule with added initial colon might need more grouping brackets: "(" [ [ ":" IRI ] / address ] ")"
I actually think allowing simply (http:// … ) with its own noninitial colon as an IRI xref would only add a little (finite) complexity to parsing, as opposed to some of the exponentially growing parse trees we may have been hitting in the past, and would look good for XDI's first-class support of IRIs - I posted a comment on this to https://wiki.oasis-open.org/xdi/XdiAbnf/Discussion earlier today.
Also noted the tel: and sms: schemes can have matched parentheses in their bodies, so if we allow these we may have to allow matched parentheses in IRIs, and do parenthetical depth counting as we parse IRIs, unless we require clients to escape and unescape all the internal parens. If we're scanning for parens, checking for the internal colon after the scheme is not much additional work.
Joseph Joseph,
First, thanks very much for this analysis of the ABNF. I hadn't appreciated it in detail until I studied after Friday's telecon. Condensing it down to four lines is a FANTASTIC way of seeing the essence of the ABNF.
Based on our discussion on Friday's call, and if you follow the recommendations I posted to https://wiki.oasis-open.org/xdi/XdiAbnf/Discussion (namely, not allowing colons in literals, and using colons to prefix IRIs within cross-references), here's a revised version of your four-line ABNF if bare literals are allowed to begin segments:
OPTION #1: IF BARE LITERALS ARE ALLOWED
address = 1*subseg [ "/" 1*subseg [ "/" 1*subseg ] ] ;
subseg = [ "=" / "@" / "+" / "$" ] [ "*" / "!" ] [ xref / literal ] ; xref = "(" [ ":" IRI / address ] ")";
literal = 1*[ iunreserved / pct-encoded ] ;
If bare literals are NOT allowed, as in the proposal we discussed on Friday, then I could only condense the ABNF into six rules
OPTION #2: IF BARE LITERALS ARE NOT ALLOWED
address = 1*subseg [ "/" 1*subseg [ "/" 1*subseg ] ]
subseg = global / local / xref global = ( "=" / "@" / "+" / "$" ) [ "*" / "!" ] [ xref / literal ]
local = ( "*" / "!" ) [ xref / literal ]
xref = "(" [ ":" IRI / address / literal ] ")" literal = 1*[ iunreserved / pct-encoded ]
|