OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

relax-ng message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]


Subject: Re: [relax-ng] Re: RFC2518 (WebDAV) / RFC2396 (URI) inconsistency



> That means the number of people who are actually using broken URIs can
> be less than what XMLSpy claims. So this was an argument for not
> removing the check for namespace URIs.  Right?

Right.  I believe the things you can check about URIs are:

1. It uses % properly, i.e. every % character is followed by two hex digits

2. Either
   (a) It's a relative URI (i.e. if there's a colon, then there must be a / 
or ? or # somewhere before the first colon), or
   (b) (i) It starts with a legal URI schema name (i.e. 
[A-Za-z][A-Za-z0-9+\.\-]*:) and (ii) the scheme-specific part is non-empty

3. It contains at most one # character

We can express each of these in turns of regexps:

1. ([^%]|%[a-fA-F0-9][a-fA-F0-9])*

2. (a) [^:]*([#/?].*)?
   (b) [a-zA-Z][\-+\.a-zA-Z0-9]*:.+

3. [^#]*(#[^#]*)?

Combining these is a little tricky.  We can combine 2(a) and 3 into:

[^:]*([/?][^#]*)?(#[^#]*)?

we can then combine that with 1 to get:

([^%:]|%[a-fA-F0-9][a-fA-F0-9])*([/?]([^%#]|%[a-fA-F0-9][a-fA-F0-9])*)?(#([
^%#]|%[a-fA-F0-9][a-fA-F0-9])*)?

Similarly, we can combine 2(b) and 3 to get

[a-zA-Z][\-+\.a-zA-Z0-9]*:((#[^#]*)|[^#]+(#[^#]*)?)

We can combine that with 1 to get

[a-zA-Z][\-+\.a-zA-Z0-9]*:((#([^%#]|%[a-fA-F0-9][a-fA-F0-9])*)|([^%#]|%[a-f
A-F0-9][a-fA-F0-9])+(#([^%#]|%[a-fA-F0-9][a-fA-F0-9])*)?)

So putting the two alternatives together we get:

(([^%:]|%[a-fA-F0-9][a-fA-F0-9])*([/?]([^%#]|%[a-fA-F0-9][a-fA-F0-9])*)?(#(
[^%#]|%[a-fA-F0-9][a-fA-F0-9])*)?)|([a-zA-Z][\-+\.a-zA-Z0-9]*:((#([^%#]|%[a
-fA-F0-9][a-fA-F0-9])*)|([^%#]|%[a-fA-F0-9][a-fA-F0-9])+(#([^%#]|%[a-fA-F0-
9][a-fA-F0-9])*)?))

I haven't tested this, so it may well be buggy, but it has very little in 
common with the XMLSpy regexp.  The only thing they seem to agree on is 
that you can have at most one # character.

One thing this does illustrate is that it is hard for an implementor to 
figure out from the specs what you are supposed to check.  It has taken 
several years for people to realize that DAV is using a syntactically 
incorrect URI.

James









 


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]


Powered by eList eXpress LLC