OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

office message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: [OASIS Issue Tracker] Issue Comment Edited: (OFFICE-3440) ODF 1.2CD05 Part 1 Needs anyIRI datatype



    [ http://tools.oasis-open.org/issues/browse/OFFICE-3440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=21687#action_21687 ] 

Dennis Hamilton edited comment on OFFICE-3440 at 9/29/10 1:48 PM:
------------------------------------------------------------------

Michael, I disagree with your reading.

There is a strong normative statement for anyURI concerning what constitutes a lexically-vaild URI.  

That is different than the statement that they do not expect schema validation for that.   They explain their motivation for not creating a validation on the URL, they do not back off their definition of what it is:

WHAT AN ANYURI MUST BE

In [xmlschema-2] (the normative reference in ODF 1.2 CD05 Part 1), section 3.2.17.1:

"The - lexical space-  of anyURI is finite-length character sequences which, when the algorithm defined in Section 5.4 of [XML Linking Language] is applied to them, result in strings which are legal URIs according to [RFC 2396], as amended by [RFC 2732]."

[RFC2732] adds the use of "[" and "]" for IPv6 addressing to [RFC2396].  These together are the predecessors of [RFC3986].

WHAT THE RESULTING DIFFERENCE IS:

[XML Linking Language] Section 5.4 states a rule for mapping to URIs that explicitly maps more characters than the mapping in [RFC3987].  [RFC3987] does not map characters below U+0080 which are excluded in the definition of IRI reference and of URI reference too.   [RFC3987] and [RFC3986] explain why those characters are excluded.  Furthermore, IRI also excludes from its mapping all code points from U+0080 to U+009F, and many other ones that you can see in the rule for ucschar in [RFC3987] section 2.2.  Note that private-use Unicode code points are excluded from the mapping *except* in the query string rule, iquery.

Since IRI resolvers are going to work according to [RFC3987] and any *additional* constraints of specific IRI/URI schemes, it seems foolish to not be definitive that the anyURI values SHALL be lexically-well-formed IRIs, since such IRIs do satisfy the lexical requirements for anyURI and they are fully defined and usable in accordance with [RFC3987].

ALTERNATIVELY:

If we want to allow the full 3.2.17.1 definition of lexically-acceptable anyURI values, then we should call them any URIs and not confuse people that these satisfy the requirements for IRI for which the sole authority is [RFC3987] at this time.  In that case, term "IRI" should not pass our lips lest we mislead folks who rely on the ODF 1.2 specification, although we might have a note or a SHOULD that would make it wise to honor [RFC3987] anyhow.  (We've alrready gone rouind and round on that assuming there is something in the PKWare APPNOTE that is not there.  I don't think we should knowingly leave that quicksand here.)

WHAT [xmlschema-2] SAYS ABOUT VALIDATION IS NOT ABOUT VERIFYING THE IRI REFERENCE SYNTAX

In NOTES just before 3.2.17.1, [xmlschema-2] provides that

NOTE: ... absolutization must not be performed by schema processors as part of schema validation. 
Note:  Each URI scheme imposes specialized syntax rules for URIs in that scheme, including restrictions on the syntax of allowed fragment identifiers. Because it is impractical for processors to check that a value is a context-appropriate URI reference, this specification follows the lead of [RFC 2396] (as amended by [RFC 2732]) in this matter: such rules and restrictions are not part of type validity and are not checked by - minimally conforming-  processors. Thus in practice the above definition imposes only very modest obligations on - minimally conforming-  processors. 

What that means is the lexical conditions in 3.2.17.1 can be checked (and we can easily verify whether RNG Schema checkers do check the datatype with a little experimentation.  But it is unreasonable to expect that scheme-specific and context-appropriate constraints in addition to the IRI General Syntax can be checked for as part of anyURI syntax verification.  (For example, we should not be expecting schema validation to determine whether a noscheme reference IRI reference was in valid form for access to a package file or for reference exxternal to the package file.  Likewise, one would not expect to be able to validate file: scheme references with the additional constraints that such a scheme generally involves, and we would not be validating whether an ifragment is meant to be limited to a IDREF value or is allowed to be an XPointer form including elaborate XPath conditions.)

FURTHERMORE

anyURI is subject to pattern constraints, and we can easily define anyIRI to simply exclude those characters that are excluded by [RFC3987] from the ASCII code set.  We can also exclude those other Unicode code points that IRI does not allow but that are allowed by [XML 1.0] in attribute values.

Since we are not using XSD 1.1 (indeed, we are not permitted to use it as a normative reference while it is only a Working Draft), I think the above explanations are sufficient.  

Finally, the observation in [RFC3987] that its definition for IRI References *fits* into any anyURI is not the same as saying the full anyURI (even in [xmlschema-2]) is *necessary* for IRI.






      was (Author: orcmid):
    Michael, I disagree with your reading.

There is a strong normative statement for anyURI concerning what constitutes a lexically-vaild URI.  

That is different than the statement that they do not expect schema validation for that.   They explain their motivation for not creating a validation on the URL, they do not back off their definition of what it is:

WHAT AN ANYURI MUST BE

In [xmlschema-2] (the normative reference in ODF 1.2 CD05 Part 1), section 3.2.17.1:

"The - lexical space-  of anyURI is finite-length character sequences which, when the algorithm defined in Section 5.4 of [XML Linking Language] is applied to them, result in strings which are legal URIs according to [RFC 2396], as amended by [RFC 2732]."

[RFC2732] adds the use of "[" and "]" for IPv6 addressing to [RFC2396].  These together are the predecessors of [RFC3986].

WHAT THE RESULTING DIFFERENCE IS:

[XML Linking Language] Section 5.4 states a rule for mapping to URIs that explicitly maps more characters than the mapping in [RFC3987].  [RFC3987] does not map characters below U+0080 which are excluded in the definition of IRI reference and of URI reference too.   [RFC3987] and [RFC3986] explain why those characters are excluded.  Furthermore, IRI also excludes from its mapping all code points from U+0080 to U+009F, and many other ones that you can see in the rule for ucschar in [RFC3987] section 2.2.  Note that private-use Unicode code points are excluded from the mapping *except* in the query string rule, iquery.

Since IRI resolvers are going to work according to [RFC3987] and any *additional* constraints of specific IRI/URI schemes, it seems foolish to not be definitive that the anyURI values SHALL be lexically-well-formed IRIs, since such IRIs do satisfy the lexical requirements for anyURI and they are fully defined and usable in accordance with [RFC3987].

ALTERNATIVELY:

If we want to allow the full 3.2.17.1 definition of lexically-acceptable anyURI values, then we should call them any URIs and not confuse people that these satisfy the requirements for IRI for which the sole authority is [RFC3987] at this time.  In that case, term "IRI" should not pass our lips lest we mislead folks who rely on the ODF 1.2 specification, although we might have a note or a SHOULD that would make it wise to honor [RFC3987] anyhow.  (We've alrready gone rouind and round on that assuming there is something in the PKWare APPNOTE that is not there.  I don't think we should knowingly leave that quicksand here.)

WHAT [xmlschema-2] SAYS ABOUT VALIDATION IS NOT ABOUT VERIFYING THE IRI REFERENCE SYNTAX

In NOTES just before 3.2.17.1, [xmlschema-2] provides that

NOTE: ... absolutization must not be performed by schema processors as part of schema validation. 
Note:  Each URI scheme imposes specialized syntax rules for URIs in that scheme, including restrictions on the syntax of allowed fragment identifiers. Because it is impractical for processors to check that a value is a context-appropriate URI reference, this specification follows the lead of [RFC 2396] (as amended by [RFC 2732]) in this matter: such rules and restrictions are not part of type validity and are not checked by - minimally conforming-  processors. Thus in practice the above definition imposes only very modest obligations on - minimally conforming-  processors. 

What that means is the lexical conditions in 3.2.17.1 can be checked (and we can easily verify whether RNG Schema checkers do check the datatype with a little experimentation.  But it is unreasonable to expect that scheme-specific and context-appropriate constraints in addition to the IRI General Syntax can be checked for as part of anyURI syntax verification.  (For example, we should not be expecting schema validation to determine whether a noscheme reference IRI reference was in valid form for access to a package file or for reference exxternal to the package file.  Likewise, one would not expect to be able to validate file: scheme references with the additional constraints that such a scheme generally involves, and we would not be validating whether an ifragment is meant to be limited to a IDREF value or is allowed to be an XPointer form including elaborate XPath conditions.)

FURTHERMORE

anyURI is subject to pattern constraints, and we can easily define anyIRI to simply exclude those characters that are excluded by [RFC3987] from the ASCII code set.  We can also exclude those other Unicode code points that IRI does not allow but that are allowed by [XML 1.0] in attribute values.

Since we are not using XSD 1.1 (indeed, we are not permitted to use it as a normative reference while it is only a Working Draft), I think the above explanations are sufficient.  





  
> ODF 1.2 CD05 Part 1 Needs anyIRI datatype
> -----------------------------------------
>
>                 Key: OFFICE-3440
>                 URL: http://tools.oasis-open.org/issues/browse/OFFICE-3440
>             Project: OASIS Open Document Format for Office Applications (OpenDocument) TC
>          Issue Type: Sub-task
>          Components: Needs Discussion, Part 1 (Schema), Schema and Datatypes
>    Affects Versions: ODF 1.2 CD 05
>            Reporter: Dennis Hamilton
>             Fix For: ODF 1.2 CD 06
>
>
> The rules for IRI references are slightly different than the rules for anyURI.  In particular, anyURI accepts ASCII characters that are excluded from IRI references by [RFC3987].
> Rather than qualify the use of anyURI to be specific to IRIs every place that anyURI is used in the current schema, it is recommended that this be handled in one place by introducing an anyIRI datatype that is  derivative of anyURI with an additional pattern constraint that eliminates the ASCII-corresponding characters that are excluded from IRI references in [RFC3987].

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://tools.oasis-open.org/issues/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]