OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

entity-resolution message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]


Subject: Re: First editing pass online


At 19:24 2001 01 31 -0500, Norman Walsh wrote:
>I've made a quick editing pass over the spec and posted a new update[1].

>[1] http://www.oasis-open.org/committees/entity/spec.html

I've gone through and made comments by marking up the 
html (red&struck-through -> delete, yellow -> add, 
lime&italic -> commentary) which I attach to this message.

paul



Title: XML Catalogs

XML Catalogs

OASIS Entity Resolution Technical Committee

Revision date: 31 Jan 2001

Permission to reproduce parts or all of this information in any form is granted to OASIS members provided that this information by itself is not sold for profit and that OASIS is credited as the author of this information.

The requirement that all external identifiers in XML documents must provide a system identifier has unquestionably been of tremendous short-term benefit to the XML community. It has allowed a whole generation of tools to be developed without the added complexity of explicity entity management. Note the preceding typo (that you might have otherwise missed, hence this comment).

However, the interoperability of XML documents has been impeded in several ways by the lack of entity management facilities:

  1. External identifiers may require resources that are not always available. For example, a system identifier that points to a resource on another machine may be inaccessible if a network connection is not available.

  2. External identifiers may require protocols that are not accessible to all of the vendors' tools on a single computer system. An external identifier that is addressed with the ftp: protocol, for example, is not accessible to a tool that does not support that protocol.

  3. It is often convenient to access resources using system identifiers that point to local resources. Exchanging these documents with other systems is problematic at best and impossible at worst.

While there are many important issues involved and a complete solution is beyond the current scope, the OASIS membership agrees upon the enclosed set of conventions to address a useful subset of the complete problem. To address these issues, this specification defines an entity catalog that maps an entity's external identifier to a URI.

OASIS Entity Resolution Technical Committee
Working Draft: 31 Jan 2001
Working Draft: 08 Jan 2001
Working Draft: 15 Dec 2000


Introduction

In order to use a variety of XML tools in a variety of computer environments, the problem of addressing resources that are inaccessible using their explicit system identifiers must be solved. Is this really what we want to say? Isn't the problem bigger than just "inaccessible via sysids"? It's also the case that, while the sysid might be accessible, that's not the resource you really want at this time (whether for reasons of caching, testing a new version, etc.). Maybe something more like the following: In order to make optimal use of the information about an XML external resource, there must be some interoperable way to map the information in an XML external identifier into a URI for the desired resource.

The short term solution to this problem is to define This specification defines an entity catalog that handles the simple cases of mapping an external entity's public identifier and/or system identifier to an alternate URI. This solution allows for a probably system-dependent but application-independent catalog. Though it does not handle all issues that a combination of a complete entity manager and storage manager addresses, it simplifies both the use of multiple products in a great majority of cases and the task of processing documents on different systems. Saying "short term" implies we've got a longer term strategy and/or that a longer term strategy is needed. I'm not sure that catalogs are so necessarily system-dependent anymore now that their right hand sides can be absolute URIs (of course, one of their advantages is that they *can* be system-dependent, but it no longer seems necessary to point that out as if it's a drawback).

An XML Entity Catalog Format

T o address the issue of multiple vendors' applications on a given system, t his specification defines a format for an application-independent entity catalog that maps external identifiers to (other) URIs. This catalog is used by an application's entity manager. This specification does not dictate when an entity manager should access this catalog; for example, an application may attempt other mapping algorithms before or (if the catalog fails to produce a successful mapping) after accessing this catalog. The catalog has a standard format. Each application that uses it must provide the user with a mechanism for specifying how and when the catalog is to be accessed. I realize these words came straight from TR9401, but they need fixing. One cannot specify how or when the catalog is accessed--this spec determines that--one only specifies the catalog to access.

For the purposes of this specification, the term catalog refers to the logical "mapping" information that may be physically contained in one or more catalog entry files. The catalog, therefore, is effectively an ordered list of (one or more) catalog entry files. It is up to the application to determine the ordered list of catalog entry files to be used as the logical catalog. (This specification uses the term "catalog entry file" to refer to one component of a logical catalog even though a catalog entry file can be any kind of storage object or entity including--but not limited to--a table in a database, some object referenced by a URL, or some dynamically generated set of catalog entries.)

Each entry in the catalog associates a URI with information about the external entity that appears in the XML document. For example, the following are possible catalog entries that associate a public identifier with a URI:

<public publicId="ISO 8879-:1986//ENTITIES Added Latin 1//EN"
        uri="iso-lat1.gml"/>
<public publicId="-//USA/AAP//DTD BK-1//EN"
        uri="aapbook.dtd"/>
<public publicId="-//ACMEExample, Inc.
//DTD Report//EN"
        uri="http://acmeexample
.com/dtds/report.dtd"/>

The complete set of catalog entry types defined by this Specification are: public, system, delegate, and nextCatalog. Two grouping elements, catalog , the top-level element, and group are also defined.

Furthermore, to provide for possible future extensions or other uses of this catalog, its format allows for "other information"--indicated by an element from a namespace other than the one defined by this Specification--that is irrelevant to and ignored by this specification.

An entry in the catalog is interpreted as follows:

  1. A public entry indicates that an entity manager should use the associated URI to locate the replacement text for an entity with the specified public identifier.

  2. The system entry indicates that an entity manager should use the associated URI to locate the replacement text for an entity whose external identifier's system identifier is explicitly specified by the system identifier.

  3. The delegate entry indicates that external identifiers with a public identifier that starts with the specified string should be resolved using a catalog specified by the associated URI.

  4. The catalog entry indicates that an entity manager should use the associated URI to locate an additional catalog entry file to be processed after the current catalog entry file.

  5. The override the previous word should be tagged attribute specifies whether to use the "prefer system id" mode or not for the search strategy for the entries contained within the element which specifies the override (see below for more discussion).

When doing a catalog lookup, an entity manager generally uses whatever is available from among the entity declaration's system identifier and public identifier to find catalog entries that match the given information. A match in one catalog entry file will take precedence over any match in a later catalog entry file (and, in fact, the entity manager need not process subsequent catalog entry files once a match has occurred). A more specific matching entry in one catalog entry file will take priority over a less specific matching entry in the same catalog entry file. For this purpose, the order of specificity of match (most specific first) is:

  1. system type entries;

  2. public type entries;

  3. delegate entries ordered by the length of the start string, longest first;

Within any given category of equal specificity, matches maintain the order of their entries in the catalog entry file so that the first such match will take priority.

In XML, all external identifiers must include a system identifier and may include a public identifier ([Production 75] of XML 1.0 Second Edition). Although the system identifier is assumed to be " a URI reference…meant to be dereferenced to obtain input for the XML processor to construct the entity's replacement text", in some circumstances (such as when the document was generated on another system, when the document was generated in another location on the same system, or when some files referenced by system identifiers have moved since the document was generated), the specified system identifiers may not be the best identifiers for the replacement text. For this or other reasons, it may be desireable to prefer the public identifier over the system identifier in determining the entity's replacement text. You go directly into describing the two modes without any lead-in about modes. Something needs to be here such as the following: Therefore, this resolution defines two modes for using the above search strategy. The two search modes are:

  1. If system identifiers are preferred and there is no matching system type entry, then the system identifier is used as the URI regardless of any public identifier. This specification does not specify what happens if a preferred system identifier does not identify an accessible storage object; an application may look up the public identifier and/or entity name to find another URI, or it may simply report an error. An application should at least have the option of issuing a warning if the system identifier fails in this mode.

  2. If public identifiers are preferred and there is no matching system type entry, the system identifier is used as the URI only if no mapping can be found in the catalog entry file for the public identifier (if a public identifier was specified).

An application must provide some way (e.g., a runtime argument, environment variable, preference switch) that allows the user to specify which of these modes to use in the absence of any occurrence of the override here and below, "override" does not have any special formatting to indicate that it is a keyword--shouldn't it? attribute on the catalog entry.

The override attribute can be used on catalog and group entry types to indicate for any set of catalog entries whether they should be able to be used in matches that may override an explicit system identifier. Each occurrence of an override attribute specifies the search strategy mode for entries contained within the catalog or group element on which it occurs. A public or delegate entry encountered when override is "yes" (corresponding to the mode where public identifiers are preferred) will be considered for possible matching whether or not the external identifier has an explicit system identifier. A public or delegate entry encountered when override is "no" (corresponding to the mode where system identifiers are preferred) will be ignored during lookups for which the external identifier has an explicit system identifier. No other entry types are affected by the override attribute. The initial search strategy in force at the beginning of each catalog entry file depends on the preference as determined by the application (possibly under user control). I note in this para we have language that suggests that an explicit system identifier may not be part of the external identifier. Elsewhere, we removed such language. However, since XML 1.0 notation declarations can have public ids but not system ids, I'm not sure which is correct. I guess it's best to leave the language as it is in the para, but then we should probably go back an revert it elsewhere where we removed such references.

When attempting matches for delegate type catalog entries, the entity's public identifier is compared to the public id start string of the delegate catalog entry looking for start strings that are initial substring matches of the entity's public identifier. If this catalog entry file produces any such matches, the value of the catalog attribute of all such matching entries are used, in order from longest partial public identifier match to shortest, to generate a new complete logical catalog (i.e., a newly specified list of catalog entry files) that replaces the current catalog.

The catalog lookup process for this entity continues with this new (replacement) catalog, ignoring for the purposes of this entity any other entries in the current catalog entry file as well as any subsequent catalog entry files that may have been part of the previous list of catalog entry files. This newly defined catalog is then processed in much the same manner as if it had been the originally specified catalog; however, only the entity's public identifier is considered as the information available for lookup--its entity name and system identifier (if any) are is not available during lookup in any "delegated to" catalog. Lookup for subsequent public identifiers is unaffected by this process; that is, the effect of this replacement catalog holds only for the lookup of the current entity's public identifier.

The nextCatalog entry can be used to insert new catalog entry files into the current list of catalog entry files. The catalog the "catalog" attribute name doesn't appear to be marked up attribute on a nextCatalog entry is used to locate another catalog entry file that is processed after the current catalog entry file if the current catalog entry file does not provide a match. Multiple nextCatalog entries are allowed, and the referenced catalog entry files will be inserted into the current catalog list in order. Note that the effect of any nextCatalog entry would occur only after all other entries in this catalog entry file have been considered.

Appendix A. An XML Schema for the XML Catalog

This appendix is not normative. This raises the interesting question, then, of just what is the normative definition of an XML catalog, since we don't appear to have any ebnf in the spec, and I don't think the verbiage we do have suffices to be a normative definition. This needs to be discussed by the group.

The syntax for a catalog entry file is defined by this XML Schema:


<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE xsd:schema SYSTEM "http://www.w3.org/2000/10/XMLSchema.dtd" [
<!ENTITY % schemaAttrs "
    xmlns:xsd   CDATA   #IMPLIED
    xmlns:xml   CDATA   #IMPLIED
    xmlns:er    CDATA   #IMPLIED
">
]>
<xsd:schema xmlns:xsd='http://www.w3.org/2000/10/XMLSchema'
            xmlns:er='http://www.oasis-open.org/committees/entity/draft'
            targetNamespace='http://www.oasis-open.org/committees/entity/draft'
            elementFormDefault='qualified'>

  <!-- $Id: spec.html,v 1.2 2001/02/01 00:01:28 ndw Exp $ -->

  <xsd:simpleType name='pubIdChars'>
    <!-- A string of the characters defined as pubIdChar in production 13
         of the Second Edition of the XML 1.0 Recommendation -->
    <xsd:restriction base='xsd:string'/>
  </xsd:simpleType>

  <xsd:simpleType name='publicIdentifier'>
    <xsd:restriction base='xsd:string'/>
  </xsd:simpleType>

  <xsd:simpleType name='partialPublicIdentifier'>
    <xsd:restriction base='er:pubIdChars'/>
  </xsd:simpleType>

  <xsd:simpleType name='systemIdentifier'>
    <xsd:restriction base='xsd:uriReference'/>
  </xsd:simpleType>

  <xsd:simpleType name='yesOrNo'>
    <xsd:restriction base='xsd:string'>
      <xsd:enumeration value='yes'/>
      <xsd:enumeration value='no'/>
    </xsd:restriction>
  </xsd:simpleType>

  <xsd:complexType name='catalog'>
    <xsd:choice minOccurs='1' maxOccurs='unbounded'>
      <xsd:element ref='er:public'/>
      <xsd:element ref='er:system'/>
      <xsd:element ref='er:delegate'/>
      <xsd:element ref='er:nextCatalog'/>
      <xsd:element ref='er:group'/>
      <xsd:any namespace='##other' processContents='skip'/>
    </xsd:choice>
    <xsd:attribute name='override' type='er:yesOrNo'/>
    <xsd:anyAttribute namespace='##other'/>
  </xsd:complexType>

  <xsd:complexType name='public'>
    <xsd:complexContent>
      <xsd:restriction base="xsd:anyType">
        <xsd:attribute name="publicId" type="er:publicIdentifier"
                       use="required"/>
        <xsd:attribute name="uri" type="xsd:uriReference" use="required"/>
        <xsd:anyAttribute namespace='##other'/>
      </xsd:restriction>
    </xsd:complexContent>
  </xsd:complexType>

  <xsd:complexType name='system'>
    <xsd:complexContent>
      <xsd:restriction base="xsd:anyType">
        <xsd:attribute name="systemId" type="er:systemIdentifier"
                       use="required"/>
        <xsd:attribute name="uri" type="xsd:uriReference" use="required"/>
        <xsd:anyAttribute namespace='##other'/>
      </xsd:restriction>
    </xsd:complexContent>
  </xsd:complexType>

  <xsd:complexType name='delegate'>
    <xsd:complexContent>
      <xsd:restriction base="xsd:anyType">
        <xsd:attribute name="publicIdStartString"
                       type="er:partialPublicIdentifier"
                       use="required"/>
        <xsd:attribute name="catalog" type="xsd:uriReference" use="required"/>
        <xsd:anyAttribute namespace='##other'/>
      </xsd:restriction>
    </xsd:complexContent>
  </xsd:complexType>

  <xsd:complexType name='nextCatalog'>
    <xsd:complexContent>
      <xsd:restriction base="xsd:anyType">
        <xsd:attribute name="catalog" type="xsd:uriReference" use="required"/>
        <xsd:anyAttribute namespace='##other'/>
      </xsd:restriction>
    </xsd:complexContent>
  </xsd:complexType>

  <xsd:complexType name='group'>
    <xsd:choice minOccurs='1' maxOccurs='unbounded'>
      <xsd:element ref='er:public'/>
      <xsd:element ref='er:system'/>
      <xsd:element ref='er:delegate'/>
      <xsd:element ref='er:nextCatalog'/>
      <xsd:any namespace='##other' processContents='skip'/>
    </xsd:choice>
    <xsd:attribute name='override' type='er:yesOrNo'/>
    <xsd:anyAttribute namespace='##other'/>
  </xsd:complexType>

  <xsd:element name="catalog" type="er:catalog"/>
  <xsd:element name="public" type="er:public"/>
  <xsd:element name="system" type="er:system"/>
  <xsd:element name="delegate" type="er:delegate"/>
  <xsd:element name="nextCatalog" type="er:nextCatalog"/>
  <xsd:element name="group" type="er:group"/>

</xsd:schema>

Appendix B. A TREX Grammar for the XML Catalog

This appendix is not normative.

T.B.D.

Appendix C. A RELAX Grammar for the XML Catalog

This appendix is not normative.

T.B.D.

Appendix D. A DTD for the XML Catalog

This appendix is not normative.

The syntax for a catalog entry file is partially[ 1] defined by this Document Type Definition:

<!-- $Id: spec.html,v 1.2 2001/02/01 00:01:28 ndw Exp $ -->

<!ENTITY % pubIdChars "CDATA">
<!ENTITY % publicIdentifier "CDATA">
<!ENTITY % partialPublicIdentifier "%pubIdChars;">
<!ENTITY % uriReference "CDATA">
<!ENTITY % systemIdentifier "%uriReference;">
<!ENTITY % yesOrNo "(yes|no)">

<!ENTITY % p "">
<!ENTITY % s "">

<!ENTITY % catalog "%p;catalog">
<!ENTITY % public "%p;public">
<!ENTITY % system "%p;system">
<!ENTITY % delegate "%p;delegate">
<!ENTITY % nextCatalog "%p;nextCatalog">
<!ENTITY % group "%p;group">

<!ELEMENT %catalog; (%public;|%system;|%delegate;|%nextCatalog;|%group;)+>
<!ATTLIST %catalog;
    xmlns%s;    %uriReference;      #FIXED
        'http://www.oasis-open.org/committees/entity/draft'
    override    %yesOrNo;       #IMPLIED
    xml:base    %uriReference;      #IMPLIED
>

<!ELEMENT %public; EMPTY>
<!ATTLIST %public;
    publicId    %publicIdentifier;  #REQUIRED
    uri     %uriReference;      #REQUIRED
    xml:base    %uriReference;      #IMPLIED
>

<!ELEMENT %system; EMPTY>
<!ATTLIST %system;
    systemId    %systemIdentifier;  #REQUIRED
    uri     %uriReference;      #REQUIRED
    xml:base    %uriReference;      #IMPLIED
>

<!ELEMENT %delegate; EMPTY>
<!ATTLIST %delegate;
    publicIdStartString %partialPublicIdentifier;   #REQUIRED
    catalog     %uriReference;      #REQUIRED
    xml:base    %uriReference;      #IMPLIED
>

<!ELEMENT %nextCatalog; EMPTY>
<!ATTLIST %nextCatalog;
    catalog     %uriReference;      #REQUIRED
    xml:base    %uriReference;      #IMPLIED
>

<!ELEMENT %group; (%public;|%system;|%delegate;|%nextCatalog;)+>
<!ATTLIST %group;
    override    %yesOrNo;       #IMPLIED
    xml:base    %uriReference;      #IMPLIED
>

References

Normative

Tim Bray, Jean Paoli, and C. M. Sperberg-McQueen, I believe the 2nd Ed includes Eve--see the entry at http://www.w3.org/TR/#Recommendationseditors. Extensible Markup Language (XML) 1.0 (Second Edition). World Wide Web Consortium, 1998 2000 .

Tim Bray, Dave Hollander, and Andrew Layman, editors. Namespaces in XML . World Wide Web Consortium, 1999.

Jonathan Marsh, editor. XML Base. World Wide Web Consortium, 2000.

Non-Normative

I think there should be a non-normative reference to TR9401. Also maybe one to our own requirements doc.

IETF (Internet Engineering Task Force). RFC 2396: Uniform Resource Identifiers (URI): Generic Syntax . T. Berners-Lee, R. Fielding, L. Masinter. 1998.

Henry S. Thompson, David Beech, Murray Maloney, et. al. editors. XML Schema Part 1: Structures. World Wide Web Consortium, 2000.

Paul V. Biron and Ashok Malhotra, editors. XML Schema Part 2: Datatypes. World Wide Web Consortium, 2000.



[1] Any catalog file which is valid according to this DTD is valid according to this Specification. However, catalog files which make use of extension elements or attributes may be valid according to this Specification but invalid according to this DTD, due to the limits of DTD validation with respect to namespaces.



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]


Powered by eList eXpress LLC