OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

entity-resolution message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]


Subject: Re: [entity-resolution] Re: uri vs. system confusion


On Saturday, February 8, Paul Grosso wrote:
> At 12:37 2003 02 08 -0700, Mark Johnson wrote:
> >John (and others),
> >
> >(BTW, this discussion started with my post [1] to the
> >entity-resolution-comment list.)
> >
> >After reading the relevant section of the XML spec (4.2.2), I still
> >find redundancy in using both system-based and uri-based elements. The
> >problem lies in the fact that the XML spec defines 'SystemLiteral' as
> >a URI, thereby _not_ limiting its form to that of a URL.
> >
> >Furthermore, (external) ENTITY declarations and NOTATION declarations
> >(XML 4.7) both make use of the form of an ExternalID, i.e. a
> >SystemLiteral.
> >
> >And, as mentioned above, the SystemLiteral is defined as a URI (not a
> >URL). From section 4.2.2 of the XML spec:
> >
> >   [Definition: The SystemLiteral is called the entity's system
> >   identifier. It is a URI reference (as defined in [IETF RFC 2396],
> >   updated by [IETF RFC 2732]), meant to be dereferenced to obtain
> >   input for the XML processor to construct the entity's replacement
> >   text.]
> >
> >Since: 
> >
> >a. all relevant non-public references are required to be
> >   SystemLiterals, and,
> >
> >b. all SystemLiterals are URIs,
> >
> >Then:
> >
> >1.  all SYSTEM ids should be treated as URIs, and nothing less.
> 
> I don't follow your reasoning.

Hopefully I can clarify my position...

> Yes, XML says that system ids are URIs. That doesn't say that
> all URIs are system ids. 

Yes, I understand this point. And I didn't imply that all URIs are
system ids - such a statement is obviously false.

> In XML, a system id is one of a pair of identifiers--the system id
> and the (optional) public id--that make up an external identifier of
> an external entity reference.

Yes, I know this.

> It's these system ids that are matched against the catalog entries that
> refer to system ids.

Sure, makes sense. But my point here is that these system ids are all URIs,
so why not use the URI elements instead of the system elements?

John Cowan provided an explanation that added some clarity:

| XML Catalogs started as an XML rewrite of Socats, the SGML
| equivalent, which was limited to only ENTITY and NOTATION and some
| SGML-specific uses of external IDs.  It was important to keep
| semantic compatibility.
|
| Generalized URI rewriting was something we added that had no
| counterpart in the SGML world.  We thought in general that
| application-level remapping in the instance might not want the same
| set of mappings as parser-level remapping in the DTD, so we kept the
| facilities separate.

They would be separate, except that the SYSTEM id in the DOCTYPE
declaration of the instance and the ENTITY/NOTATION declarations in
the DTD all (should) use system-based elements. In this sense the
facilities have not been kept separate.*

  *[Unless you're advocating use of a uri-based element to resolve the
   SYSTEM id in the prolog, as this surely points to an external
   entity - the dtd. If so, I'm really confused.]

> Anything else in an XML document that the application may have
> reason to believe is a URI reference is not a system id. 

Agree here, as well. May not me a system id, but it is a URI
reference:)

> These are the things that are matched against the catalog entries
> that refer to URIs.

'Sounds to me like the general usage rule of thumb is: 

  - use system-based elements for resolving the SYSTEM entry in the
    DOCTYPE declaration of the document instance, and for resolving
    ENTITY and NOTATION declarations in the DTD

  - use uri-based elements for all other urls that may appear in the
    document instance. 

FWIW, I'm much more concerned with _understanding_ the need for both
system-based and uri-based elements than I am with changing the spec.

My perspective comes from a purely practical point of view: 

 I am editor and de facto policy lead for XML Catalog implementation
 on Debian GNU/Linux.

Hence I need to understand the intended usage as well as the
motivation for various aspects of the XML Catalogs spec. (E.g. I'll
get bombarded with messages demanding explanations for certain
recommendations I'll make, esp regarding the uri vs. system usage.)

Maybe the solution is as simple we adding some clarifying text to the
spec that explains the motivation for employing both sets of elements,
as well as some usage recommendations that clearly distinguish
when/when not to use system/uri elements. Accompanying would help a
great deal, too.
 
> >Therefore all SYSTEM references should make use of the uri-based
> >elements - and not the system-based elements.
> 
> I don't see the "therefore" in your argument.

It's as simple as "all system ids are URIs, therefore use uri-based
elements for all SYSTEM ids and for all URIs (URLs being a subset)
whether they appear in the document instance or in the DTD".

Do you see my point?

> >Furthermore, use of uri, rewriteURI, and delegateURI in lieu of the
> >system, rewriteSystem, and delegateSystem elements seems to be
> >consistent with Production 75 of the XML spec.
> >
> >So I don't see why the system-based elements are needed at all.
> >
> >   (I can't help but wonder I'm missing something here, as I now feel
> >    like I understand the issue...)
> 
> You appear to have company in that there are others that disagree
> with the distinction we have decided to make in the spec. 

Perhaps some clarifying statements in the spec (providing rationale,
motivation, and examples) would reduce the number of objections. My gut
feeling is that the spec readers would be much more comfortable with the
spec's contents if they understood _why_ the spec contains what it does.

> As far as I can tell, this disagreement has gone on for a while and
> neither side seems to come closer to changing its mind.

As I imply above, these disagreements might be in part (largely??) due
to a communication problem. Improved clarity, rationale, and examples
would IMO help a great deal.

> (I am still firmly on the side that cannot see why folks cannot see
> the difference between system ids and non-system id URIs,

I believe they _do_ see the difference between the two, but simply
don't see the rationale for a distinct set of elements for each.

> but smart people are on the other side too.)

Geez, Paul, I hope you count me in the latter group:)

Thanks for taking the time to address my concerns.

Cheers,
Mark

> paul
> 
> >Clarification, anyone?
> >
> >Thanks,
> >Mark
> >
> >[1] http://lists.oasis-open.org/archives/entity-resolution-comment/200302/msg00000.html
> >
> >On Friday, February 7, John Cowan wrote:
> >> Mark Johnson scripsit:
> >>
> >> > [...] I'm still a bit confused about the distinction between the 
> >> > uri-based elements and the system-based elements.
> >> 
> >> In a word, the system-based elements are used only when an XML parser is
> >> processing ENTITY and NOTATION declarations in the DTD.  The URI-based
> >> elements are used for all other lookups.
> >>
> >>
> >> > B. [...] another implied usage is IMO that SYSTEM Ids get remapped
> >> >     _only_ via a <rewriteSystem> element, and never via the 
> >> >     <rewriteURI> element. (This despite the fact that all system 
> >> >     identifiers are URIs.)
> >> > 
> >> >    Is this the intended usage? 
> >> 
> >> Yes.
> >>
> >> >     Does the reason for using <rewriteURI.../> have something to do
> >> >     with the fact that the stylesheet URI does NOT appear in the XML 
> >> >     document, and is therefore not restricted by the ExternalID as
> >> >     SystemLiteral constraint as per the XML spec? Again, any 
> >> >     clarification here would be greatly appreciated..
> >> 
> >> Just so.
> >
> >-- 
> >_____________________________________
> >Mark Johnson        <mark@dulug.duke.edu>
> >Debian XML/SGML     <mrj@debian.org>
> >Home Page:          <http://dulug.duke.edu/~mark/>
> >GPG fp: 50DF A22D 5119 3485 E9E4  89B2 BCBC B2C8 2BE2 FE81
> >
-- 
_____________________________________
Mark Johnson        <mark@dulug.duke.edu>
Debian XML/SGML     <mrj@debian.org>
Home Page:          <http://dulug.duke.edu/~mark/>
GPG fp: 50DF A22D 5119 3485 E9E4  89B2 BCBC B2C8 2BE2 FE81


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]


Powered by eList eXpress LLC