OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

uddi-spec message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: FW: FW: UDDI: Interop issues relating to the XML Schema datatype anyURI


Here is the answer from W3C on our anyURI issue.
In short, while URIs disallow non-ASCII characters, the XML Schema anyURI datatype allows them and references the character escaping approach as described in XLink.

This is actually independent of the W3C IRI activities.

Sorry for not sending this to the list ealier.

Claus

-----Original Message-----
From: Martin Duerst [mailto:duerst@w3.org] 
Sent: Wednesday, April 21, 2004 3:19 AM
To: Von Riegen, Claus
Subject: Re: FW: UDDI: Interop issues relating to the XML Schema datatype anyURI 


Hello Claus,

I think the answer to your question is quite clear:
XML Schema allows a very wide variety of characters as
lexical values in attributes/elements of type anyURI.

XML Schema Part 2: Datatypes, 3.2.17, anyURI
(http://www.w3.org/TR/2001/REC-xmlschema-2-20010502/#anyURI),
is quite clear about this. If you see anything in this section
that would indicate something different, I'd be interested to know.
It would be rather useless to specify the transformation to escaped
characters if anyURI was restricted in such a way that no such
escaping would actually be needed.

That non-ASCII (Unicode) characters are allowed is also clear from
3.2.17.1, Lexical representation, which says:
"The .lexical space. of anyURI is finite-length character sequences which,
when the algorithm defined in Section 5.4 of [XML Linking Language] is
applied to them, result in strings which are legal URIs according to
[RFC 2396], as amended by [RFC 2732]."
Obviously, in XML Schema, "character" means any Unicode character.
Also, for example in the path component of a http: scheme anyURI, you
can start with any Unicode character and apply the conversion
procedure and get a legal URI. [please note that the current URI
spec wouldn't allow this for the host part, but this is being
fixed in an update to the URI spec, but minimally conformant
XML Schema processors are not required to check this]

So with respect to the tools mentioned below by Luc, .Net is
correct, and xerxes is wrong.

Regards,    Martin.



At 16:35 04/04/20 +0200, Von Riegen, Claus wrote:
>Martin,
>
>I am try鱈ng to understand your feedback on the allowed characters in XML 
>elements of type xsd:anyURI. Are non-ASCII Unicode characters, such as the 
>辿 allowed or not?
>To me, the transformation to escaped characters, when used as actual URIs, 
>is quite clear, but the range of allowed characters is not yet.
>
>Does the section on "Character Encoding in URI References" in "Character 
>Model for the World Wide Web 1.0" 
>(<http://www.w3.org/TR/2001/WD-charmod-20010126/#sec-URIs>http://www.w3.org 
>/TR/2001/WD-charmod-20010126/#sec-URIs) answer this by the following paragraph?
>
>"W3C specifications that define protocol or format elements (e.g. HTTP 
>headers, XML attributes,...) whose role is that they be interpreted as URI 
>references (or specific subsets of URI references, such as absolute URI 
>references, URIs,...) MUST allow these protocol or format elements to 
>contain characters disallowed by the URI syntax. The disallowed characters 
>include all non-ASCII characters, plus the excluded characters listed in 
>Section 2.4 of 
><http://www.w3.org/TR/2001/WD-charmod-20010126/#rfc2396>[RFC 2396], except 
>for the number sign (#) and percent sign (%) characters and the square 
>bracket characters re-allowed in 
><http://www.w3.org/TR/2001/WD-charmod-20010126/#rfc2732>[RFC 2732]."
>Thanks,
>  Claus von Riegen, SAP AG and OASIS UDDI Spec TC member
>
>-----Original Message-----
>From: Luc Cl辿ment [mailto:luc@iclement.net]
>Sent: Dienstag, 2. M辰rz 2004 00:05
>To: 'Martin Duerst'; cmsmcq@w3.org; connolly@w3.org
>Cc: Von Riegen, Claus; 'Rogers, Tony'; 'Tom Bellwood'
>Subject: RE: UDDI: Interop issues relating to the XML Schema datatype anyURI
>
>Martin,
>Re examples of tools: for example, Xerces only accepts encoded chars; i.e. 
>it rejects 辿 for example. .NET on the other hand accepts the 辿 allowing 
>one to persist it in an element of type anyURI. Thus the problem - a 
>Microsoft (for example) implementation might accept URIs that are rejected 
>by another implementation (on based on Xerces for example); a number of 
>interop problems ensue as a result.
>
>Thanks for the info on the draft and its submission. We'll review it and 
>consider it as part of our deliberation of this topic.
>
>Thanks again,
>Luc
>
>-----Original Message-----
>From: Martin Duerst [<mailto:duerst@w3.org>mailto:duerst@w3.org]
>Sent: Monday, March 01, 2004 09:15
>To: Luc Clement; cmsmcq@w3.org; connolly@w3.org
>Cc: 'Von Riegen, Claus'; 'Rogers, Tony'; Tom Bellwood
>Subject: RE: UDDI: Interop issues relating to the XML Schema datatype anyURI
>
>Hello Luc,
>
>Many thanks for your inquiries about IRI and anyURI. I was not aware of 
>the as-of-yet somewhat patchy support. Can you give examples of tools that 
>support and don't support anyURIs? This should definitely be fixed, and 
>maybe we should create some tests.
>
>Regarding the schedule of what's currently draft-duerst-iri-06.txt, we are 
>thinking that we are very close to submitting it to the IETF.
>Please watch 
><http://www.w3.org/International/iri-edit/>http://www.w3.org/International/ 
>iri-edit/.
>
>Regards,    Martin.
>
>At 14:48 04/02/28 -0800, Luc Clement wrote:
> >"urn:schemas-microsoft-com:office:office">
> >Correction of email address for Dan Connolly to
> ><<mailto:connolly@w3.org>mailto:connolly@w3.org>connolly@w3.org
> >
> >
> >----------
> >From: Luc Cl鬧粗nt [<mailto:luc@iclement.net>mailto:luc@iclement.net]
> >Sent: Saturday, February 28, 2004 14:45
> >To: 'cmsmcq@w3.org'; 'duerst@w3.org'; 'dan@w3.org'
> >Cc: 'Von Riegen, Claus'; 'Rogers, Tony'; Tom Bellwood
> >(bellwood@us.ibm.com)
> >Subject: UDDI: Interop issues relating to the XML Schema datatype
> >anyURI
> >
> >To:     M. Sperberg-McQueen,Chair XML Schema Working Group,
> ><<mailto:cmsmcq@w3.org>mailto:cmsmcq@w3.org>cmsmcq@w3.org
> >         Martin D揃st,Internationalization Activity Lead,
> ><<mailto:duerst@w3.org>mailto:duerst@w3.org>duerst@w3.org
> >         Dan Connolly, URI Activity Lead, [lc]
> ><<mailto:connolly@w3.org>mailto:connolly@w3.org>connolly@w3.org
> >
> >Gentlemen,
> >
> >I'm writing to you as the Secretary of the OASIS UDDI Spec Technical
> >Committee. UDDI makes use of anyURI in elements that are intended to be
> >directly invokable (e.g. discoveryURL and overviewURL).  We have
> >encountered interoperability issues with respect to internationalized
> >URIs. Without going into specifics, some schema processor
> >implementations only accept characters defined in RFC2732, whereas
> >others accept Unicode characters that would result in valid RFC2372
> >URIs if processed by the algorithm defined by XLink. This affects how
> >such elements need to be handled by a client, persisted by a UDDI node, 
> and replicated.
> >
> >The TC is considering a number of alternatives (see extract below of
> >minutes [1]), ranging from requiring the use (by a client) of the more
> >restrictive case - RFC2732 - and encoding of Unicode characters that
> >fall out of its range, to allowing support (by a server) of the full
> >breath of Unicode characters in elements of type anyURI. There are of
> >course of other options and different means of going about addressing
> >this as described in the minutes.
> >
> >We are trying to assess how best to tackle this issue by understanding
> >the current state of work on internationalized URIs in XML Schema. What
> >is the current state of affairs with anyURI in XML Schema as it relates
> >to internationalized URIs? What ongoing activities should we know of
> >that may help us determine what our best immediate and longer term
> >course of action are? Any guidance would be most appreciated.
> >
> >Thanks in advance for your time and consideration.
> >
> >Luc Cl鬧粗nt
> >Secretary, OASIS UDDI Spec TC
> >
> >[1] Minutes of the UDDI Spec TC FTF Meeting 20040210-12,
> ><http://www.oasis-open.org/committees/download.php/5649/TC_FTF_Minutes-
> >V1.7
> >-20040210-12.htm#_Toc65400705><http://www.oasis-open.org/committees/downl 
> oad>http://www.oasis-open.org/committees/download.
> >php/5649/TC_FTF_Minutes-V1.7-20040210-12.htm#_Toc65400705
> >
> >Excerpt from [1]:
> >
> >
> >
> >5.5 V3 Issue: discoveryURL & overviewURL
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >Andrew submitted the following item for discussion:
> >
> >During the course of implementing a V3 registry, we have found that the
> >discoveryURL and overviewURL could cause interoperability issues due to
> >the current state of schema assessment performed by various
> >implementations on the XML Schema datatype anyURI.
> >
> >
> >
> > From the XML Schema specification, it says it is lexically valid after
> > applying the algorithm from XLink which could mean that certain
> > characters outside of US-ASCII would be valid as long as they can be
> > encoded later per XLink section 5.4.
> >
> >
> >
> >Lexical representation
> >
> >The 繧ュlexical space繧ュ of anyURI is finite-length character sequences
> >which, when the algorithm defined in Section 5.4 of [XML Linking
> >Language] is applied to them, result in strings which are legal URIs
> >according to [RFC 2396], as amended by [RFC 2732].
> >
> >
> >
> >from XLink:
> >
> >Some characters are disallowed in URI references, even if they are
> >allowed in XML; the disallowed characters include all non-ASCII
> >characters, plus the excluded characters listed in Section 2.4 of [IETF
> >RFC 2396], except for the number sign (#) and percent sign (%) and the
> >square bracket characters re-allowed in [IETF RFC 2732]. Disallowed
> >characters must be escaped as follows:
> >
> >1.        Each disallowed character is converted to UTF-8 [IETF RFC 2279]
> >as one or more bytes.
> >
> >2.        Any bytes corresponding to a disallowed character are escaped
> >with the URI escaping mechanism (that is, converted to %HH, where HH is
> >the hexadecimal notation of the byte value).
> >
> >3.        The original character is replaced by the resulting character
> >sequence.
> >
> >
> >
> >The result of the indirection in this definition is that some
> >implementations of anyURI accept only characters defined in RFC2732,
> >whereas others accept the Unicode characters that would result in valid
> >RFC2372 URIs if processed by the algorithm in XLink.
> >
> >
> >
> >The XML Schema group has acknowledged that for I18N reasons, the schema
> >allows Unicode characters in anyURI and that for now clients should
> >transform them to access/invoke the resource.  I would like to know if
> >the UDDI TC desires this flexibility as well.  If the UDDI TC desires
> >that the client be able to specify Unicode without escaping the
> >non-ASCII characters, it may be beneficial for short term
> >interoperability of UDDI implementations to change to the string
> >datatype as is already the case with access points.  Another option
> >would be to place a post schema assessment restriction requiring that
> >the publisher escapes the URIs per
> >RFC2372 prior to publication.
> >
> >
> >
> >Minutes:
> >
> >About half of the clients have implemented this differently - we should
> >expect to see interop problems as a result.  It generally remains the
> >client's responsibility to convert URI's into a URL that can support
> >dereferencing via an internet call.  Also, do we want UDDI to have
> >directly invokable URI's, or should we expect clients to transform
> >these URI's before using them?  Clients such as Microsoft IE typically
> >do such transformations automatically, etc.  If we change the type to a 
> string we
> >get over the interop issue.   Another option is that we mandate they be
> >pre-transformed.  Yet another option is that we can leave this matter
> >alone and expect that this issue will have been resolved in the next 2 yrs.
> >
> >
> >
> >Treating all of these URLs as strings would make everything consistent,
> >though consideration should be made to convert the accessPoint into an
> >anyURI instead of a string.
> >
> >
> >
> >Daniel suggested creating a new UDDI specific URI type which is
> >actually invokable, inheriting from anyURI.  We'd describe the type in
> >the spec indicating it is invokable.  This would of course require us
> >to version the spec when W3C & IETF finishes its work on IRIs.
> >
> >
> >
> >At issue is whether we make UDDI independent of other specs, or whether
> >we accept impact of changes in other specifications on UDDI
> >implementations at any given time.


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]