OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

entity-resolution message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]


Subject: Re: System URIs


Paul Grosso wrote:


>>I think the confusion comes from the fact that XML 1.0, section 4.2.2
>>describes a process of escaping certain characters within the SystemLiteral.
>>But XML 1.0 is a bit vague, it doesn't say when this escaping should be
>>performed or by whom.


Actually this was fixed by 2nd ed. erratum 4
(http://www.w3.org/XML/xml-V10-2e-errata#E4), which makes the
relevant paragraph read:

# The XML processor must escape disallowed characters as follows:
# The disallowed characters include all non-ASCII characters, plus
# the excluded characters listed in Section 2.4 of [IETF RFC 2396],
# except for the number sign (#) and percent sign (%) characters
# and the square bracket characters re-allowed in [IETF RFC 2732].

[%-escaping algorithm snipped]

>>So it is unclear if the SystemLiteral will have been
>>escaped before it is supplied to the EntityResolver.


See below.


> Production [11] makes it clear that system
> literals can contain any characters, and that should stand.


Undoubtedly.

> When (and
> only when) the string matching the SystemLiteral terminal in the language
> is interpreted as a URI reference, it may need to be escaped before passing
> it around the web.  But this part of XML 1.0 cannot be saying that the
> SystemLiteral in the XML file cannot contain certain characters, and there
> is no reason to be doing URI ref escaping before passing the string to the
> catalog resolver.


I suppose it depends on whether you interpret the catalog resolver as
part of the XML processor, or part of the XML application.  If it is
part of the application, then disallowed (including non-ASCII)
characters must be %-escaped.

If that is so, then URIs appearing in catalogs must of course be

%-escaped too, so that byte-for-byte equal URIs can match.  It
would be bogus to allow them in system ids and disallow them in
catalogs.

For example, the system ids "http://example.org/étude" (unnormalized
URI) and "http://example.org/%C3%A9tude" (normalized URI) mean exactly
the same thing, and should match either form as a catalog entry.

-- 
There is / one art             || John Cowan <jcowan@reutershealth.com>
no more / no less              || http://www.reutershealth.com
to do / all things             || http://www.ccil.org/~cowan
with art- / lessness           \\ -- Piet Hein



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]


Powered by eList eXpress LLC