OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

entity-resolution message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]


Subject: Re: SAX 2.0 enhancement proposal


/ David Brownell <david-b@pacbell.net> was heard to say:
[...]
| The XML spec is quite explicit on this topic: "relative URIs are
| relative to the location of the resource within which the entity
| declaration occurs" (4.2.2).

I don't believe that assertion is in question, however, Section
4.2.2. says:

  [Definition: The SystemLiteral is called the entity's system
  identifier. It is a URI reference (as defined in [IETF RFC 2396],
  updated by [IETF RFC 2732]), meant to be dereferenced to obtain
  input for the XML processor to construct the entity's replacement
  text.]

Note that it says "meant to be dereferenced". It does not say "must"
be dereferenced.

  It is an error for a fragment identifier (beginning with a #
  character) to be part of a system identifier. Unless otherwise
  provided by information outside the scope of this specification

I believe it is entirely consistent for the entity resolution
mechanism to be considered information outside the scope of this
specification.

  (e.g. a special XML element type defined by a particular DTD, or a
  processing instruction defined by a particular application
  specification), relative URIs are relative to the location of the
  resource within which the entity declaration occurs. A URI might
  thus be relative to the document entity, to the entity containing
  the external DTD subset, or to some other external parameter entity.

In addition, it says:

  [Definition: In addition to a system identifier, an external
  identifier may include a public identifier.] An XML processor
  attempting to retrieve the entity's content may use the public
  identifier to try to generate an alternative URI reference.

Clearly then, the system identifier does not have to be dereferenced
since an alternate URI may be constructed by the parser if it chooses.

Note also that the XML2e Rec also does not say that an entity resolver
is forbidden from considering the existing system identifier in
addition to the public identifier in attempting to generate an
alternate URI reference.

  If the processor is unable to do so, it must use the URI reference
  specified in the system literal. Before a match is attempted, all

Clearly the fallback is that the system literal must be used if no
other alternate URI can be found.

| Those are the only contexts in which an XML parser needs to resolve
| URIs, and there's no weasel-wording that would allow what that
| catalog spec is intending to do.  So I don't see why SAX should
| permit anything else, unless the XML spec gets a substantive
| functional change there ...

I don't understand your conclusion at all. Why is it the case that
this doctype declaration:

  <!DOCTYPE foo PUBLIC "-//Example//DTD foo" SYSTEM "../foo.dtd">

must be presented to the resolver as

  public="-//Example//DTD foo"
  system="file://path/to/absolute/foo.dtd"

Why is it not equally reasonable for the resolver to be presented with

  public="-//Example//DTD foo"
  system="../foo.dtd"

since that's actually what the document *says*.

If the system identifier that the parser finally winds up using is a
relative URI reference, it's clear that it's relative to the base URI
of the containing entity. As I said before, I don't think that's in
dispute.

What I have never understood is why the SAX API feels that "early
absolutization" is preferable to "late absolutization".

| >     Unfortunately, SAX 2.0 requires that system identifiers
| > that are URIs are made absolute before calling the EntityResolver, thereby
| > robbing the catalog processor of the opportunity to compare the system
| > identifier with the catalog entries.
| 
| Relative URIs are, classically, trouble.  They're very easily
| mis-understood, since implicit context is easy to get wrong.  Why is
| this catalog draft trying to encourage/facilitate error-prone and
| complex idioms?  Which, moreover, are intended to violate the XML
| specification?

I assert that they do not voilate the XML spec. As to why, because it
is both more flexible and more consistent with TR9401.

                                        Be seeing you,
                                          norm

-- 
Norman.Walsh@Sun.COM   | 'I have done that,' says my memory. 'I cannot
XML Standards Engineer | have done that'--says my pride, and remains
Technology Dev. Group  | adamant. At last--memory yields.--Nietzsche
Sun Microsystems, Inc. | 


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]


Powered by eList eXpress LLC