Back to OASIS Member's-Only Registry and Repository Technical Committee Home Page
Copyright © 2000 by OASISOrganization for the Advancement of Structured Information Systems
12 March 2000
The OASIS Registry and Repository Technical Committee of OASIS, the Organization for the Advancement of Structured Information Standards (formerly SGML Open), seeks to specify the interfaces of a registry for some set of or XML-related entities, including but not limited to DTDs and schemas, with appropriate interfaces, that enable searching on the contents of a repository of those entities. The registry and repository shall interoperate and cooperate with other registries and repositories compliant with this specification and respond to requests for entities by their identifiers. This document deals primarily with the registry, although some scenarios and requirements for the repository are included.
Table of Contents
Comments welcome, to tallen[at]sonic.net.
This document was marked up in Norm Walsh's Simplified Docbook XML DTD, and converted to HTML with the aid of his XSL style sheet. Output in RTF and other formats can be obtained by using James Clark's XT and XP tools and Norm Walsh's DSSSL style sheets for Docbook.
Please be sure to cite the version of the specification along with relevant section title when making comments.
The words must, must not, shall, shall not, and may, in this document are to be interpreted as described in RFC 2119.
As XML comes into use on the Web, DTDs, schemas, style sheets, and reuseable public text will be referred to by identifier, rather than being packaged with actual documents. It is critically necessary to be able to retrieve the referred-to entities, and in the Web context, it is preferrable to be able to do this automatically. And it is vital for users to be able to locate DTDs and schemas for the document types they want to create by consulting an interface to metadata about those DTDs and schemas.
Objective and Deliverables. The objective of the OASIS Registry and Repository Technical Committee is to develop a specification for interoperable registries and repositories for SGML- and XML-related entities, including but not limited to DTDs and schemas, with an interface that enables searching on the contents of a repository of those entities, and to construct a prototype registry and repository. The registry and repository are to be designed to interoperate and cooperate with other registries and repositories compliant with this specification. The prototype is intended to serve as a model for an extensible and distributed network of registries and repositories; the specification is viewed as the primary deliverable.
Contributors. The chairman of the OASIS Registry and Repository Technical Committee is Terry Allen of Commerce One, Inc. Norbert Mikula of DataChannel is the OASIS Chief Technical Officer. The members of the Technical Committee are: Nagwa Abdelghfour (Sun Microsystems), Terry Allen (Commerce One), Lisa Carnahan (U.S. NIST), Robin Cover (ISOGEN) Úna Kearns (Documentum), Norbert Mikula (DataChannel), Yutaka Yoshida (Sun Microsystems), and Priscilla Walmsley (XMLSolutions). Ron Daniel (Metacode) is an invited expert. Other individuals who have contributed to the development of the specification include Joe Alfonso (Sun Microsystems), Murray Altheim (Sun Microsystems), Bryan Caporlette (Sequoia Software), Len Gallagher, (U.S. NIST), Eduardo Gutentag (Sun Microsystems), Michael Mealling (Network Solutions), Ron Schuldt (UDEF), and Norm Walsh (Arbortext),
The following design principles have been agreed to:
The Registry Technical Specification shall employ existing standards and specifications where possible, avoiding specifications that are not stable. OASIS must be prepared to track developments such as ANSI X3.285, which is the proposed revision of Part 3 of ISO/IEC 11179, and the W3C's XML Schema specification so that they can be considered for use when they are mature.
The normative part of the Registry Technical Specification shall be as small as reasonable.
The normative part of the Registry Technical Specification shall be complete enough that registries and repositories conformant to it can interoperate in an extensible and distributed network.
The normative part of the Registry Technical Specification shall be extensible; in particular, it shall be possible to extend the registration information schema or DTD without inhibiting interoperability among registries. (This point is called out because the registration information schema or DTD is likely to be a normative part of this specification).
Immediate needs should be satisified first. A repository offers opportunities for the application of many kinds of technologies; OASIS should focus on providing DTDs and schemas, and an interface to their metadata, before proceding to other matters.
The registry shall be user-friendly.
The Registry Technical Specification shall be vendor-neutral.
The Registry Technical Specification shall be as easy to implement as practicable.
The Registry Technical Specification shall use XML by preference for encoding of information and documents.
The first complete and finished version of the Registry Technical Specification shall be delivered quickly.
The Registry Technical Specification shall assume the use of HTTP.
The registry and repository shall be scaleable.
The implementation of the registry and repository shall be testable against their design documents.
These scenarios involve both users retrieving something from the repository and contributors registering something in the registry, which may involve depositing something in the repository.
A user or user agent retrieves an XML-related entity such as a DTD automatically over the Web, as a result of some use of it in an XML context.
Motivation. Unless everything needed for parsing and displaying a document under all circumstances is packaged with the document itself, the document must refer to something (DTD, style sheet, public text) by identifier. It is necessary to be able to retrieve the referred-to entity, and in the Web context, it is preferrable to be able to do this automatically.
Example A. A user is sent a document the DOCTYPE declaration of which refers to a DTD by unique identifier (URN, PI, or FPI). His parser tells him it can't find the DTD, so he goes out and retrieves it manually from a repository (he doesn't need the registry interface because he has a unique ID but he does need to know where to find the repository).
Example B. A user clicks on a link to the stockmarket news and his browser receives an XML document the DOCTYPE declaration of which refers to a DTD by unique identifier; his browser, which has no copy locally, retrieves it automatically from the repository.
A creator of an XML-related entity deposits it, possibly along with related data, for service to the public, at some range of accessibility from archival (retrieval rate could be slow) to utility (retrieval rate must be fast, large number of connections must be supported, round-the-clock uptime with failover, etc.).
Motivation. Many creators of XML entities lack the facilities to serve them reliably; even those that can do so may not wish to deal with the burden.
Example A. An IETF working group decides that a DTD that is part of their specification, but which the IETF has no facilities to serve, must be available from a public Web server with high bandwidth, and doesn't want to have to maintain the server. It sends the DTD to a repository and the repository serves it, as in the first scenario.
Example B. A consortium or consultancy wishes its DTDs to be available for inspection and display. It deposits the DTDs, along with their documentation and sample instances, in a repository and provides appropriate metadata for the repository's registry interface. The owner of the repository undertakes to make them available (but not with a high guaranteed quality of service).
Example C. Rosetta Net, a (real life) consortium of hardware vendors and suppliers, develops UML models, DTDs, and sets of text values used in their content, all expected to be in heavy demand, the text values to change frequently. It deposits the UML models (as XMI), DTDs, and the initial set of text values in a repository, contracts for a regular update schedule and the highest available quality of service, and the repository undertakes to serve them, update them as agreed, push updates to subscribers, and maintain high quality of service for retrieval requests. Rosetta Net doesn't need a registry interface for this purpose because everything is to happen automatically, but it provides appropriate registry metadata so that the DTDs can be browsed and searched.
Example D. The Air Transport Association, which maintains important DTDs but make them available only to its members, wishes to offload the work of supplying those DTDs. It deposits the DTDs in a repository, contracts for service as in Example C, and in addition arranges that the DTDs are listed in the registry interface but are available only when an appropriate credential is presented in connection with a request for them. (This is an application of access control.)
The owner of an XML-related entity, or another repository, registers the entity in the registry, but does not deposit the entity itself.
Motivation. Registries can interoperate to increase useability, but the actual storage location of an entity alone must not restrict the content of a registry.
Example A. A company wishes to makes its DTDs visible in the OASIS-sponsored registry, but prefers to serve them itself. It submits appropriate registry documents to the registry, including a pointer to the address from which it serves the DTDs, and agrees with the registry that it will supply timely update information and that the registry will update its records and interface in a timely manner.
Example B. A special-purpose registry wishes to makes its content visible in the OASIS-sponsored registry, while maintaining that content in its own repository. It submits appropriate registry documents to the registry, including a pointer to its repository, and agrees with the registry that it will supply timely update information and that the registry will update its records and interface in a timely manner.
A user ready to compose an XML document searches for a DTD that covers the subject of the document.
Motivation. Every day in newsgroups and e-mail discussion lists such as comp.text.sgml, comp.text.xml, and xml-dev people ask whether there is a DTD for some subject area or functional purpose. The number of such queries will grow if XML is widely adopted. Somehow they have to be answered if wheel reinvention is to be minimized.
Example A. A user is about to write his resume, and wants to use XML. He goes to a registry and looks in a subject hierarchy (or taxonomy) to find a resume DTD (this is browsing, not searching). The subject hierarchy interface displays three appropriate listings, he chooses among them on the basis of their descriptions, downloads the DTD he chose from the repository, manually adds it to his SO catalog, and sets to work with vi and SP.
Example B. A user is about to write his resume, and wants to use XML. He goes to a registry and uses its search engine to find a resume DTD (this is searching, not browsing). The search interface returns three hits, he chooses among them on the basis of their descriptions, downloads from the repository the DTD he chose, and loads it into his XML writing tool. The interface also provides a time-to-live value, showing him how long he can expect his resume DTD to be served by the repository.
Example C. A homeowner is about to advertise his house for sale, and opens his verboprocessor. He says "take a memo: real estate for sale" and the verboprocessor automatically contacts a registry to find an appropriate XML DTD (there is one already for real estate listings). He dictates the text of his ad without knowing anything about XML, and the verboprocessor sends it to all real estate listing services it can locate. (In this scenario the verboprocessor uses a registry to find something in a repository.)
Example D. An XML application designer needs a component to represent the list of names of French provinces, so he consults a registry. The registry interface indicates that the list is available as a tab-delimited list in ASCII, as an XML schema datatype declaration, and as a parameter entity declaration in DTD syntax. He chooses the parameter entity declaration format by clicking something in the interface, and the repository returns it.
NOTE: while it does not seem too useful at this stage, attention may be paid to SC32 WG2's 1999-04-20 draft Metadata Query Service: An Object Technology Extension to the ISO/IEC 11179 Specification and Standardization of Data Elements, Part 3, Basic Attributes, which has both use cases and IDL for behavioral aspects of a data registry (p. v).
NOTE: There are additional scenarios in ISO/IEC 11179.
On the basis of the registry and repository scenarios, the following functional requirements have been identified:
Registration. A registry must support registration of the contents of the repository (and potentially other repositories) using standardized administrative metadata.
Classification. Metadata must support application of both controlled vocabulary (for taxonomic view) and uncontrolled vocabulary (for searching) for subject matter of registered entities.
Service. The registry must return a registered entity in response to a request by URN, URL, PI, and, or, FPI. That is, a user shall be able to request an XML-related entity by PI, FPI, URL, or URN (note that some entities may have multiple unique identifiers) and get the entity as the response. (This requirement can be thought of as applying to the repository, but it is convenient to imagine all requests being channelled through the registry.)
Metadata. The registry must return metadata about registered entity in response to a request in a specified format that uses a unique identifier.
Revision. It must be possible for the SO to request (and obtain) revision of information it provided, or that the RA assigned (or that is provided as added value) while not changing the registered entity.
The OASIS Registry and Repository Technical Committee has omitted to specify how to provide certain functionality that it is agreed is needed:
Submission. Submissions to a registry may be made by an application-to- application process or through a human-manipulable interface. Certain semantics related to the purpose of the submission should be specified, but it is not useful to specify the design of a human-manipulable interface.
Further, while it may prove to be useful to specify a common method of application-to-application submission, the method of packaging a submission package should not be specified until current work in the IETF on XML packaging has borne fruit.
Consequently, this specification does not prescribe any method for submitting items to a registry. Certain semantics related to submission are specified (below). This area will require further work in the future.
Service Description. While it is considered desireable that any registry provide a way to obtain information about the registration authorities it supports, and other services, the OASIS Registry and Repository Technical Committee has not specified a method of doing so.
Interoperability. Work on interoperability has been postponed to projected second phase of work in order to concentrate on the specification of the registry itself.
PIs and FPIs. Support of the requirement to return registered entities in response to a request by PI or FPI has not been provided. It is envisioned that these unique identifiers should be conveyed as URNs, but construction of a URN name space for this purpose has not been essayed.
This specification is intended to support an XML-based implementation of ISO/IEC 11179, Metadata Registries. The DTDs for that implementation are provided in OASIS Registry and Repository DTD Documentation.
The data model of registry interfaces shall follow that laid out in ISO/IEC 11179 as closely as feasible. (Note that there is a pointer to ISO/IEC 11179 online in the appendix below.) The procedure for registration of data elements is described in Part 6 of ISO/IEC 11179, and for purposes of this specification any XML-related entity can be registered using the same procedure.
While ISO/IEC 11179 can be used for description of data elements and data element dictionaries, and the DTDs for this specification can support the description of data element dictionaries without decomposing them into descriptions of individual data elements.
ISO/IEC 11179 describes the roles of the Registration Authority (RA), any organization authorized to register data elements (part 6, 3.13); Submitting Organization (SO), the organization or unit within an organization that has submitted the data element for addition, change, or cancellation/withdrawal in the data element dictionary (part 6, 3.16), and Responsible Organization (RO), the organization or unit within an organization that is responsible for the contents of the mandatory attributes by which the data element is specified (part 6, 3.15; this can be a maintenance agency). The consensus of the OASIS Registry and Repository Technical Committee is that in registry metadata it should not be necessary to name a person as Registrar.
Every entity registered (including everything deposited in the repository) must be provided with administrative metadata. It must be possible for a registry to make this metadata available as an XML document conformant to a DTD provided as part of this specification.
All parties assigning URNs or other identifiers must assign them only from name spaces they are authorized to use. URN resolver preferences may be expressed in the URN Resolver Preferences DTD, for which urnsamp.txt is a sample instance.
Viewed from the standpoint of an operating registry, and aside from semantics related to submission, the APIs specified here are:
Request for a registered entity by unique identifier
Request for metadata about a registered entity by unique identifier
Format of OASIS-conformant metadata about a registered entity
Metadata for entities registered in the registry and for entities deposited in the repository shall be made availalbe in the form of XML documents. These documents need not be the storage format for the information they contain, but are the normative representation of that informationtheir semantics and syntax are the API to that information. The set of sample files with filenames beginning db can be taken as examples. Note that these point to the actual location of registered items elsewhere. The content of such documents would be constructed from information supplied by the SO and amplified by the RA.
If the SO submits keyword information according to a taxonomy unknown to the RA, the RA must ask the SO to supply the taxonomy or a resolvable reference to it. For this purpose as well as support of the interface, the registry must be able to list the taxonomies that it supports.
The various categories of administrative and registration statuses in ISO/IEC 11179 are not aligned optimally and require adjustment. For example, retired must not be in the same list as other values such as certified, so that it can be determined what the status of a retired data element was when it was retired. However, it is recognized that the set supplied in ISO/IEC 11179 is not suitable for every registry, so no set is specified.
The human-readable interface to a registry shall be constructed from the content of the registration documents, augmented with whatever value-added information the registry provides.
Users shall be warned that links to entities made through the interface may be fragile, and that links to entities should be made only by means of unique identifiers.
Listed here are sample documents, DTDs, and entities that contain lists of values for certain attributes in the DTDs. The submission samples are actually full sets of composite metadata; they are what a registry should return in response to requests for registry metadata.
The documentation for the DTDs and entities may be found in OASIS Registry and Repository DTD Documentation.
Cover letter, a nonnormative example of semantics to be used in the context of submission
Contact information for an RA, SO, and RO and Document fragment used in contact information
Registration document samples for the Docbook DTD, one for each module, one for the entire distribution and one describing related data - these represent the RA's composite metadata for these components of Docbook.
Declarations of other entity files as XML parameter entities
List of administrative status values per ISO/IEC 11179, with some additions
Declaration of OASIS-specific values in a customization layer
Sample instance showing use of DTD for NAICS taxonomy
Document fragment used in contact information for this classification scheme.
Sample instance showing use of DTD for contents of a repository sorted according to NAICS taxonomy
Sample instance of URN resolver preferences file
Resolution of requests by URL and URN are discussed in RFC 2483, URI Resolution Services Necessary for URN Resolution. This an experimental specification appears to be well suited to the purposes of the OASIS Registry and Repository Technical Committee. Its typology of requests, results, errors, and security considerations is well considered.
As the OASIS Registry and Repository Technical Committee is willing to limit the protocols supported to HTTP, the syntax proposed in RFC 2169, "A Trivial Convention for using HTTP in URN Resolution" (THTTP) is specified here, with revisions to bring it in line with the later RFC 2483, to wit, replacement of the L2* and N2* requests with a generic I2* request. Thus emended, section 2.0 of RFC 2169 reads:
The general approach used to encode resolution service requests in THTTP is quite simple:
GET /uri-res/<service>?<uri> HTTP/1.0For example, if we have the URN "urn:foo:12345-54321" and want a URL, we would send the request:
GET /uri-res/I2L?urn:foo:12345-54321 HTTP/1.0The request could also be encoded as an HTTP 1.1 request. This would look like:
GET /uri-res/I2L?urn:foo:12345-54321 HTTP/1.1 Host: <whatever host we are sending the request to>Responses from the HTTP server follow standard HTTP practice. Status codes, such as 200 (OK) or 404 (Not Found) shall be returned. The normal rules for determining cachability, negotiating formats, etc. apply.
To use this syntax in general, one would follow the pattern (cast as a URL rather than a full HTTP request):
http://someregistry.org/<function>?argument
To obtain an entity such as the Docbook DTD (the URN is imaginary):
http://someregistry.org/I2R?urn:x-oasis:dtds:Docbook-v3.1
To obtain the composite metadata document for the Docbook DTD (the URN is again imaginary):
http://someregistry.org/I2C?urn:x-oasis:dtds:Docbook-v3.1
RFC 2483 defines an I2C request (section 4.5), for resolution of a URL or URN to a description of a resource (URC). As this is a generic request, the OASIS Registry and Repository Technical Committee chooses not to require registries conformant with this specification to return an entity's registration document in response to this request; registries are free to supply whatever their preferred metadata is, which may extend that specified here. (An RA may set its own policy with respect to what metadata it will accept beyond that specified here.) Instead, an additional request, I2X, is specified as returning the entity's registration document as defined here. Some information, such as personal contact information, may be withheld by the RA, perhaps on the basis of the identity of the requestor, if such is its policy.
The I2CS request, section 4.6, allows a request for multiple documents; in the absence of agreement on XML packaging, it is not clear that it would be useful to implement it at the present time.
The intent of an SO's communication with an RA must be specified somehow. The OASIS Registry and Repository Technical Committee provides as nonnormative an element, cover-letter in admin.dtd, and a sample of its use, coverletter.txt.
Certain semantics related to the purpose of submission are specified as advisory and nonnormative in submission-purpose-list.ent, which provides the value of the data-element-association-type attribute on the data-element-association element. They must be bound to the identities of the SO and RA (as shown in the cover-letter element type declaration, see example coverletter.txt), and in the case of some of them, bound to the identity of a previously submitted (and, if applicable, registered) item through the use of a data-element-association element). The values are:
the submitted item(s) are new to the registy, and it is requested that they be registered in accordance with terms of business previously or to be established between the SO and the RA.
the submitted item(s) are revisions of items already submitted by the same SO and registered by the same RA, and it is requested that they be registered.
the submitted item(s) supersede items already registered; it is requested that those items have their status changed to "superseded" with a data element association to the superseding items, but that they be retained in the registry.
no item is submitted, but the indicated registered item is requested to be withdrawn from the registry, its metadata to be retained with the status changed to "retired".
no item is submitted, but the indicated submitted (not necessarily registered item) is requested to be suppressed without retention of its metadata.
Conformance requirements go here. As ISO/IEC 11179 is specified in ordinary language, conformance to it in its present form can be tested only by reading that standard in parallel with this specification.
This specification declares the need for certain policies, but does not specify their contents.
The registry and repository shall have published policies relating to their provision of intellectual property notices for entities in the repository; that is, whether the interface to the registry or repository warns of the existence of copyright notices, asserted licenses, or other intellectual property restrictions or encumbrances, or leaves it to the user to discover them.
The registry and repository shall have published policies relating to their use of methods to guarantee the integrity of entities in repository and metadata in the registry; for example, does the repository employ digital signatures to ensure against corruption? if transformations of registered entities are served are they signed as well?
The registry and repository shall have security policies sufficient to engender confidence in the registry and repository.
The complete content of both the registry and repository shall be backed up offsite, and the backup tested. Some plan shall be made for reconstituting the registry and repository from the backup should the original site be rendered inoperable.
The registry and repository shall have published policies relating to its plans for continuing in operation and the outcomes to be expected should it cease operation or should business relationships with the owners of its content change. A point of departure for describing archival longevity is the Reference Model for an Open Archival Information System (OAIS) which is a draft ISO standard.
It shall be possible for an SO to request that an entity be kept available for a given length of time, also that it be withdrawn after a given length of time. TO DO: devise the semantics for this, perhaps in a DTD fragment.
It shall be possible for an SO to request the retraction of an entity.
The registry and repository shall have published policies relating to the privacy of users and the sale or other distribution of usage information.
ISO/IEC 11179 defines a data element status value, certified (Part 6, p. 9) for a recorded data element [that] has met the quality requirements specified in this and other parts of ISO/IEC 11179.
If the registry provides this or other quality control checking, it shall provide metadata about what specifications an entity conforms to and who did the testing to determine that conformance. (XML validity vs. well-formedness falls under this heading.)
A registry shall have a statement of limitation of legal liability (disclaiming responsibility for the use of information in the repository, for example).
A registry shall have a statement of the quality of service it can be expected to provide.
Glossed here are relevant terms, including acronyms, with entries for some specifications relevant to the registry and the repository.
Formal Public Identifier, defined in the SGML Standard, ISO/IEC 8879, section 10.2, and further in ISO/IEC 9070.
The SGML Standard, Information processingText and office systemsStandard Generalized Markup Language (SGML).
Further specifies PIs and FPIs, first defined in the SGML Standard, Information technologySGML support facilitiesRegistration procedures for public text owner identifiers.
ISO/IEC 11179 is online at http://www.sdct.itl.nist.gov/~ftp/l8/11179/ . The home page of the relevant committee is http://sdct-sunsrv1.ncsl.nist.gov/~ftp/l8/sc32wg2/projects/11179content/content-home.htm with a link to an HTML representation of the stanadard. It is proposed to replace Part 3 of 11179 with ANSI X3.285, Metamodel for the Management of Shareable Data, which you can find in HTML at http://www.lbl.gov/~olken/X3L8/drafts/Metamodel/MetaModel_ToC.html and in Word and PDF format (filenames beginning dpX3-285) at ftp://sdct-sunsrv1.ncsl.nist.gov/x3l8/x3l8docs/x3.285/docs/ .
Reference Model for an Open Archival Information System, a draft ISO standard.
Public Identifier, defined in the SGML Standard, ISO/IEC 8879, section 10.1.6, and further in ISO/IEC 9070.
Registration Authority (ISO/IEC 11179).
a location or set of distributed locations where documents pointed at by a registry reside, and from which they can be retrieved by conventional (http, ftp) means, perhaps with an additional authentication/permissions layer.
Responsible Organization (ISO/IEC 11179).
Submitting Organization (ISO/IEC 11179).
Unified Modelling Language, an Object Management Group specification for visual modelling of object-oriented systems, see UML Resource Page.
Uniform Resource Characteristics, a general term for any metadata about a resource identified by a URL or URN.
Uniform Resource Locator.
Uniform Resource Name. This is a list of IETF (and other) documents relating to URNs, originally drawn up by Murray Altheim of Sun Microsystems and updated by Terry Allen. The documents he thinks most important are marked with an asterisk.
Charter of the current IETF WG
Requests For Comments
*Uniform Resource Identifiers (URI): Generic Syntax (RFC 2396)
*Resolution of Uniform Resource Identifiers using the Domain Name System (RFC 2168)
A Trivial Convention for using HTTP in URN Resolution (RFC 2169)
Architectural Principles of Uniform Resource Name Resolution (RFC 2276) (expresses the author's point of view, which is not consensus)
Using Existing Bibliographic Identifiers as Uniform Resource Names (RFC 2288)
Internationalized Uniform Resource Identifiers (IURI)
*URI Resolution Services Necessary for URN Resolution (RFC 2483)
*A URN Namespace for IETF Documents (RFC 2648)
Internet Drafts
*Resolution of Uniform Resource Identifiers using the Domain Name System
*The Naming Authority Pointer (NAPTR) DNS Resource Record
URN Namespace Definition Mechanisms
XML Metadata Interchange, an Object Management Group specification of XML formats for interchange of UML models, see UML Resource Page.
Extensible Markup Language, an application profile (that is, an application) of SGML, specified by the W3C, Extensible Markup Language (XML) 1.0.
In this document, XML-related entity means any XML or SGML entity necessary for the processing of an XML document, or documentation of such an entity.
Back to OASIS Member's-Only Registry and Repository Technical Committee Home Page