[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]
Subject: 8 Jan 2001 Entity Resolution Draft
Here is a new draft of the spec. I believe that this draft incorporates all of the decisions made so far.Title: XML Catalogs
XML CatalogsOASIS Entity Resolution Technical CommitteeRevision date: 08 Jan 2001 Copyright © 2000, 2001 by OASIS Permission to reproduce parts or all of this information in any form is granted to OASIS members provided that this information by itself is not sold for profit and that OASIS is credited as the author of this information. Two different but related issues pertaining to entity management impede interoperability of XML documents:
While there are many important issues involved and a complete solution is beyond the current scope, the OASIS membership agrees upon the enclosed set of conventions to address a useful subset of the complete problem. To address these issues, this resolution defines an entity catalog that maps an entity's external identifier and/or name to a URI. OASIS Entity Resolution Technical
Committee Table of Contents In order to use a variety of XML tools in a variety of computer environments, there are two different but related problems to solve:
There are many important issues involved and a complete solution—possibly including work within the standards community—is beyond the current scope. However, the OASIS membership agrees at this time upon a set of conventions that addresses a useful subset of the complete problem. The short term solution for issue A defines an entity catalog that handles the simple cases of mapping an external entity's public identifier and/or entity name to a file name, URL, or other storage object identifier. This solution allows for a probably system-dependent (at least in the case of file names) but application-independent catalog. Though it does not handle all issues that a combination of a complete entity manager and storage manager addresses, it simplifies use of multiple products in a great majority of cases and can in some cases (e.g., with URLs) provide internet-wide, system-independent resolution of public identifiers. To address the issue of multiple vendors' applications on a given system, this resolution defines a format for an application-independent entity catalog that maps external identifiers to (other) URIs. This catalog is used by an application's entity manager. This resolution does not dictate when an entity manager should access this catalog; for example, an application may attempt other mapping algorithms before or (if the catalog fails to produce a successful mapping) after accessing this catalog. The catalog has a standard format. Each application that uses it must provide the user with a mechanism for specifying how and when the catalog is to be accessed. For the purposes of this resolution, the term catalog refers to the logical “mapping” information that may be physically contained in one or more catalog entry files. The catalog, therefore, is effectively an ordered list of (one or more) catalog entry files. It is up to the application to determine the ordered list of catalog entry files to be used as the logical catalog. (This resolution uses the term “catalog entry file” to refer to one component of a logical catalog even though a catalog entry file can be any kind of storage object or entity including—but not limited to—a table in a database, some object referenced by a URL, or some dynamically generated set of catalog entries.) Each entry in the catalog associates a URI with information about the external entity that appears in the XML document. For example, the following are possible catalog entries that associate a public identifier with a URI: <er:public publicId="ISO 8879-1986//ENTITIES Added Latin 1//EN" uri="iso-lat1.gml"/> <er:public publicId="-//USA/AAP//DTD BK-1//EN" uri="aapbook.dtd"/> <er:public publicId="-//ACME//DTD Report//EN" ur="http://acme.com/dtds/report.dtd"/> The complete set of catalog entry types defined by this Specification are: public, system, delegate, and nextCatalog. Two grouping elements, catalog, the top-level element, and group are also defined. Furthermore, to provide for possible future extensions or other uses of this catalog, its format allows for “other information”—indicated by an element from a namespace other than the one defined by this Specification—that is irrelevant to and ignored by this resolution. The formal syntax for a catalog entry file is defined by this XML Schema:
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE xsd:schema SYSTEM "/share/doctypes/xmlschema/XMLSchema.dtd" [
<!ENTITY % schemaAttrs "
xmlns:xsd CDATA #IMPLIED
xmlns:xml CDATA #IMPLIED
xmlns:er CDATA #IMPLIED
">
]>
<xsd:schema xmlns:xsd='http://www.w3.org/2000/10/XMLSchema'
xmlns:er='http://www.oasis-open.org/committees/entity/draft'
targetNamespace='http://www.oasis-open.org/committees/entity/draft'
elementFormDefault='qualified'>
<!-- $Id: spec.xsd,v 1.1 2001/01/09 18:16:37 ndw Exp $ -->
<xsd:simpleType name='minimumLiteral'>
<xsd:restriction base='xsd:string'/>
</xsd:simpleType>
<xsd:simpleType name='publicIdentifier'>
<xsd:restriction base='xsd:string'/>
</xsd:simpleType>
<xsd:simpleType name='partialPublicIdentifier'>
<xsd:restriction base='er:minimumLiteral'/>
</xsd:simpleType>
<xsd:simpleType name='systemIdentifier'>
<xsd:restriction base='xsd:uriReference'/>
</xsd:simpleType>
<xsd:simpleType name='yesOrNo'>
<xsd:restriction base='xsd:string'>
<xsd:enumeration value='yes'/>
<xsd:enumeration value='no'/>
</xsd:restriction>
</xsd:simpleType>
<xsd:complexType name='catalog'>
<xsd:choice minOccurs='1' maxOccurs='unbounded'>
<xsd:element ref='er:public'/>
<xsd:element ref='er:system'/>
<xsd:element ref='er:delegate'/>
<xsd:element ref='er:nextCatalog'/>
<xsd:element ref='er:group'/>
<xsd:any namespace='##other' processContents='skip'/>
</xsd:choice>
<xsd:attribute name='override' type='er:yesOrNo'/>
<xsd:anyAttribute namespace='##other'/>
</xsd:complexType>
<xsd:complexType name='public'>
<xsd:complexContent>
<xsd:restriction base="xsd:anyType">
<xsd:attribute name="publicId" type="er:publicIdentifier"
use="required"/>
<xsd:attribute name="uri" type="xsd:uriReference" use="required"/>
<xsd:anyAttribute namespace='##other'/>
</xsd:restriction>
</xsd:complexContent>
</xsd:complexType>
<xsd:complexType name='system'>
<xsd:complexContent>
<xsd:restriction base="xsd:anyType">
<xsd:attribute name="systemId" type="er:systemIdentifier"
use="required"/>
<xsd:attribute name="uri" type="xsd:uriReference" use="required"/>
<xsd:anyAttribute namespace='##other'/>
</xsd:restriction>
</xsd:complexContent>
</xsd:complexType>
<xsd:complexType name='delegate'>
<xsd:complexContent>
<xsd:restriction base="xsd:anyType">
<xsd:attribute name="publicIdStartString"
type="er:partialPublicIdentifier"
use="required"/>
<xsd:attribute name="catalog" type="xsd:uriReference" use="required"/>
<xsd:anyAttribute namespace='##other'/>
</xsd:restriction>
</xsd:complexContent>
</xsd:complexType>
<xsd:complexType name='nextCatalog'>
<xsd:complexContent>
<xsd:restriction base="xsd:anyType">
<xsd:attribute name="catalog" type="xsd:uriReference" use="required"/>
<xsd:anyAttribute namespace='##other'/>
</xsd:restriction>
</xsd:complexContent>
</xsd:complexType>
<xsd:complexType name='group'>
<xsd:choice minOccurs='1' maxOccurs='unbounded'>
<xsd:element ref='er:public'/>
<xsd:element ref='er:system'/>
<xsd:element ref='er:delegate'/>
<xsd:element ref='er:nextCatalog'/>
<xsd:any namespace='##other' processContents='skip'/>
</xsd:choice>
<xsd:attribute name='override' type='er:yesOrNo'/>
<xsd:anyAttribute namespace='##other'/>
</xsd:complexType>
<xsd:element name="catalog" type="er:catalog"/>
<xsd:element name="public" type="er:public"/>
<xsd:element name="system" type="er:system"/>
<xsd:element name="delegate" type="er:delegate"/>
<xsd:element name="nextCatalog" type="er:nextCatalog"/>
<xsd:element name="group" type="er:group"/>
</xsd:schema>
Alternatively, it is partially[1] defined by this Document Type Definition:
<!-- $Id: spec.dtd,v 1.1 2001/01/09 18:16:37 ndw Exp $ -->
<!ENTITY % minimumLiteral "CDATA">
<!ENTITY % publicIdentifier "CDATA">
<!ENTITY % partialPublicIdentifier "CDATA">
<!ENTITY % uriReference "CDATA">
<!ENTITY % systemIdentifier "%uriReference;">
<!ENTITY % yesOrNo "(yes|no)">
<!ENTITY % p "">
<!ENTITY % s "">
<!ENTITY % catalog "%p;catalog">
<!ENTITY % public "%p;public">
<!ENTITY % system "%p;system">
<!ENTITY % delegate "%p;delegate">
<!ENTITY % nextCatalog "%p;nextCatalog">
<!ENTITY % group "%p;group">
<!ELEMENT %catalog; (%public;|%system;|%delegate;|%nextCatalog;|%group;)+>
<!ATTLIST %catalog;
xmlns%s; %uriReference; #FIXED
'http://www.oasis-open.org/committees/entity/draft'
override %yesOrNo; #IMPLIED
xml:base %uriReference; #IMPLIED
>
<!ELEMENT %public; EMPTY>
<!ATTLIST %public;
publicId %publicIdentifier; #REQUIRED
uri %uriReference; #REQUIRED
xml:base %uriReference; #IMPLIED
>
<!ELEMENT %system; EMPTY>
<!ATTLIST %system;
systemId %systemIdentifier; #REQUIRED
uri %uriReference; #REQUIRED
xml:base %uriReference; #IMPLIED
>
<!ELEMENT %delegate; EMPTY>
<!ATTLIST %delegate;
publicIdStartString %partialPublicIdentifier; #REQUIRED
catalog %uriReference; #REQUIRED
xml:base %uriReference; #IMPLIED
>
<!ELEMENT %nextCatalog; EMPTY>
<!ATTLIST %nextCatalog;
catalog %uriReference; #REQUIRED
xml:base %uriReference; #IMPLIED
>
<!ELEMENT %group; (%public;|%system;|%delegate;|%nextCatalog;)+>
<!ATTLIST %group;
override %yesOrNo; #IMPLIED
xml:base %uriReference; #IMPLIED
>
where public identifier, system identifier , and minimum literal are as defined in XML 1.0 Second Edition. Additional requirements:
An entry in the catalog is interpreted as follows:
When doing a catalog lookup, an entity manager generally uses whatever is available from among the entity declaration's system identifier and public identifier to find catalog entries that match the given information. A match in one catalog entry file will take precedence over any match in a later catalog entry file (and, in fact, the entity manager need not process subsequent catalog entry files once a match has occurred). A more specific matching entry in one catalog entry file will take priority over a less specific matching entry in the same catalog entry file. For this purpose, the order of specificity of match (most specific first) is:
Within any given category of equal specificity, matches maintain the order of their entries in the catalog entry file so that the first such match will take priority. Generally, when a system identifier is specified in an external entity declaration, it can be trusted to be a valid URI. However, in some circumstances (such as when the document was generated on another system, when the document was generated in another location on the same system, or when some files referenced by system identifiers have had their locations changed since the document was generated), the specified system identifiers may not be valid. For this or other reasons, preferring the public identifier over the system identifier may be the preferred way of accessing the entity. Therefore, this resolution defines two modes for using the above search strategy when an external identifier has an explicit system identifier. (Furthermore, a system catalog entry can be used to map an explicit system identifier given in an external entity declaration into any URI; a matching system type entry would take precedence over a public type entry regardless of the search mode strategy.) The two search modes are:
An application must provide some way (e.g., a runtime argument, environment variable, preference switch) that allows the user to specify which of these modes to use in the absence of any occurrence of the override attribute on the catalog entry. The override attribute can be used on catalog and group entry types to indicate for any set of catalog entries whether they should be able to be used in matches that may override an explicit system identifier. Each occurrence of an override attribute specifies the search strategy mode for entries contained within the catalog or group element on which it occurs. A public or delegate entry encountered when override is “yes” (corresponding to the mode where public identifiers are preferred) will be considered for possible matching whether or not the external identifier has an explicit system identifier. A public or delegate entry encountered when override is “no” (corresponding to the mode where system identifiers are preferred) will be ignored during lookups for which the external identifier has an explicit system identifier. No other entry types are affected by the override attribute. The initial search strategy in force at the beginning of each catalog entry file depends on the preference as determined by the application (possibly under user control). When attempting matches for delegate type catalog entries, the entity's public identifier is compared to the public id start string of the delegate catalog entry looking for start strings that are initial substring matches of the entity's public identifier. If this catalog entry file produces any such matches, the catalog attribute all such matching entries are used, in order from longest partial public identifier match to shortest, to generate a new complete logical catalog (i.e., a newly specified list of catalog entry files) that replaces the current catalog. The catalog lookup process for this entity continues with this new (replacement) catalog, ignoring for the purposes of this entity any other entries in the current catalog entry file as well as any subsequent catalog entry files that may have been part of the previous list of catalog entry files. This newly defined catalog is then processed in much the same manner as if it had been the originally specified catalog; however, only the entity's public identifier is considered as the information available for lookup—its entity name and system identifier (if any) are not available during lookup in any “delegated to” catalog. Lookup for subsequent public identifiers is unaffected by this process; that is, the effect of this replacement catalog holds only for the lookup of the current entity's public identifier. The nextCatalog entry can be used to insert new catalog entry files into the current list of catalog entry files. The catalog attribute on a nextCatalog entry is used to locate another catalog entry file that is processed after the current catalog entry file if the current catalog entry file does not provide a match. Multiple nextCatalog entries are allowed, and the referenced catalog entry files will be inserted into the current catalog list in order. Note that the effect of any nextCatalog entry would occur only after all other entries in this catalog entry file have been considered. Since this resolution pertains to public identifiers, it addresses one additional detail about public identifiers. ISO 8879 is inconsistent about the use of hyphens and colons in ISO owner identifiers. Clause 10.2.1.1 of 8879:1986 (unamended) has a note indicating that the ISO owner identifier for the SGML standard is “ISO 8879–1986”. Production [171] of clause 13 indicates that the minimum literal in the SGML declaration must be “ISO 8879–1986”. While Amendment 1 of 8879 does not alter clause 10.2.1.1, it does alter production [171] of clause 13 to say that the minimum literal in the SGML declaration should be “ISO 8879:1986”. This has lead to the propagation of both the dash and the colon in ISO owner identifiers. In the interests of interoperability, this OASIS resolution requires that all products accept either form as a valid ISO owner identifier. Note, however, that this should not be construed to mean that a public identifier using one form should necessarily cause a catalog lookup match to succeed with a public identifier using the other form; while this resolution requires SGML systems to accept either form as valid, in practice, two entries (differing only by the single “:” or “–” character) may be needed in the catalog if both forms should refer to the same storage object identifier. ReferencesTim Bray, Jean Paoli, and C. M. Sperberg-McQueen, editors. Extensible Markup Language (XML) 1.0. World Wide Web Consortium, 1998. Tim Bray, Dave Hollander, and Andrew Layman, editors. Namespaces in XML. World Wide Web Consortium, 1999. IETF (Internet Engineering Task Force). RFC 2396: Uniform Resource Identifiers (URI): Generic Syntax. T. Berners-Lee, R. Fielding, L. Masinter. 1998. Jonathan Marsh, editor. XML Base. World Wide Web Consortium, 2000. Henry S. Thompson, David Beech, Murray Maloney, et. al. editors. XML Schema Part 1: Structures. World Wide Web Consortium, 2000. Paul V. Biron and Ashok Malhotra, editors. XML Schema Part 2: Datatypes. World Wide Web Consortium, 2000. [1] Any catalog file which is valid according to this DTD is valid according to this Specification. However, catalog files which make use of extension elements or attributes may be valid according to this Specification but invalid according to this DTD, due to the limits of DTD validation with respect to namespaces. |
Here's a PDF that I made with XSL and PassiveTeX. The formatting needs work, but it's a start.
Be seeing you, norm -- Norman.Walsh@East.Sun.COM | Many ideas grow better when transplanted to XML Technology Center | another mind than in the one where they Sun Microsystems, Inc. | sprang up.--Oliver Wendell Holmes
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]
Powered by eList eXpress LLC