[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]
Subject: Re: DOCBOOK: Q: how to store articles DOI numbers?
I attach an article on DOIs. It appears that we need to provide for multiple identifiers; we have ISBN and ISSN elements, but a better solution might be: <BiblioIdentifier type="doi">afdsafdjsakf;jdsa</> <BiblioIdentifier type="isbn">fdjsakfjd;lakfjd</> and we can add to the list of types upon request. I feel sure we'll see more types in the future. regards, Terry
http://www.elsevier.co.jp/inca/homepage/about/diginfo/Menu.shtml <html><head> <title>General Information</title> <style type="text/css"> <!-- BODY { font-family: Times,Helvetica; } TD { font-family: Times,Helvetica; } DL { font-family: Times,Helvetica; } P { font-family: Times,Helvetica; } UL { font-family: Times,Helvetica; } H1 { font-family: Times,Helvetica; } FORM { font-family: Times,Helvetica; } A { font-family: Times,Helvetica; text-decoration: none; } --> </style> </head><body bgcolor="#ffffff"> <a name="Go to top"> <p><font size="5"><b>Digital Information Objects and the STM Publisher</b></font></p> <p><i>Reproduced from STM Annual Report, 1997</i></p> <p><b>Introduction</b></p> <p>This review summarises activities during the past year (to August 1997) of relevance to STM publishers in defining <i>standards for identifying digital information objects</i>, and <i>applications of such standards in electronic publishing</i>. Additional background information is available in two other documents published this year:</p> <ul> <li>A brief introduction to the topic of identifiers, recently updated by the authors: <i>Unique Identifiers: a brief introduction, by Brian Green and Mark Bide. </i>[BIC; March 1997 <a href="http://www.bic.org.uk/bic/uniquid.html">http://www.bic.org.uk/bic/uniquid.html</a>]</li> <li>A more extensive review, expanded from the paper distributed earlier this year as an insert with STM Newsletter 101 and since published in both paper and electronic forms: this contains a full glossary of terms and detailed references: <i>Information Identifiers, by Norman Paskin</i>, [Learned Publishing, Vol. 10 No. 2, pp 135-156 (April 1997); also available at <a href="/locate/infoident">http://www.elsevier.nl/locate/infoident</a>].</li> </ul> <p><b>Identifiers, document computing and electronic commerce</b></p> <p>Information identifiers are of interest because of their potential applications. A core concept is the distinction between <i>"simple"</i> ("dumb", "unintelligent" or "meaningless") identifiers on the one hand, and "compound" ("intelligent" or "meaningful") identifiers on the other. Simple identifiers are only a unique label for a digital object; compound identifiers also contain other information (<i>metadata</i>) which conveys some additional facts such as location, format, owner, etc. Simple identifiers can also be used to provide such information about the object they identify, by using them to point to repositories of metadata. These additional pieces of information about a digital object act as hooks for other actions; in an electronic environment these other actions typically include format and presentation instructions (<i>document computing</i>) and rights and sales transactions (<i>electronic commerce</i>).</p> <p>Whilst there continues to be active discussion of simple identifiers (in particular, PII and ISWC), much activity is currently on potential compound identifiers (DOI, URNs, etc.). The requirements imposed on a compound identifier for storing metadata have consequences for the identifier itself: a complete understanding of the topic of identifiers therefore takes us into areas of mark-up, multimedia rights clearance systems, and electronic commerce. </p> <p><i>Mark-up </i>developments are briefly covered here only in the context of relevance to identifiers. <i>Multimedia rights clearance systems</i> are the subject of a number of initiatives, including EC schemes such as Imprimatur [<a href="http://www.imprimatur.alcs.co.uk/expert.htm">http://www.imprimatur.alcs.co.uk/expert.htm</a>] and recently the EC MMRCS project within Info 2000 managed by PIRA. [<a href="http://www2.echo.lu/info2000/en/infowkpg.html">http://www2.echo.lu/info2000/en/infowkpg.html</a>]; they will not be discussed here. </p> <p><i>Electronic commerce systems</i> are likely to be determined by banks and other institutions; publishers need not become involved in their development but will wish to use proven systems. A frontrunner is the VISA/MasterCard SET (Secure Electronic Transaction) proposal of 1996 [<a href="http://www.rsa.com/set/">http://www.rsa.com/set/</a>]), which aims to have system availability in 40 countries by the end of 1997, although this now looks optimistic as considerable problems (due to set-up complexity and transaction times) were reported in July 1997 by a number of banks currently trialling SET 1.0. The World Wide Web Consortium (W3C) activity Joint Electronic Payment Initiative (JEPI) has now been down scaled to an Interest Group For Electronic Commerce, which had a first meeting in April 97 and is currently awaiting member input regarding next steps for a meeting in September 1997. [<a href="http://www.w3.org/Payments/Activity">http://www.w3.org/Payments/Activity</a>] </p> <p><b>PII: Publisher Item Identifier</b></p> <p>PII, introduced in 1995 [<a href="/inca/homepage/about/pii/">http://www.elsevier.nl/inca/homepage/about/pii/</a>] by the STI group of publishers, remains in active use by publishers participating in its origination and others (e.g. American Mathematical Society). Amongst related information users, ISI are actively considering the use of PII in their abstracting and indexing services. Several publishers adopting PII have stated that they intend to use PII as the publisher-assigned portion of future potential schemes such as DOI. PII provides an easy to use simple identifier which can be integrated into compound identifiers, and has the advantage of an ASCII alphanumeric character syntax (e.g. S016538069600403) which poses no problems for exchange protocols or naming conventions.</p> <p>It is worth recapping why those publishers who originated the PII continue to actively use and support it. The PII originators required an identifier that was short enough to be useful in document ordering; the version 1 of SICI that was in effect at the time PII was established was grounded in print (page number etc.) whereas something was needed which worked for electronic information; and the latest SICI, DOI and URN developments had not been formally initiated (arguably PII activities spurred them on, as intended by PII participants). PII remains an effective and easy to implement simple identifier for use within a publishers system or for exchange between defined parties; it also provides a very good basis for integration into the compound identifiers and systems now being considered for usages such as rights control and electronic commerce. The PII originators are currently considering whether extensions to PII to allow for specification of components to an arbitrary level of granularity would be a useful recommendation, and if so how this might be accomplished.</p> <p>The question has been raised of whether the Year 2000 compliance issue (Y2K or millennium problem of computer data systems) has any consequences for PII: it does not. A date cannot be derived from a PII so the Y2K issue is irrelevant. PII, when used to identify a serial item, may contain as its ninth and tenth numerical characters two digits derived from the year of publication (a recommendation made by the PII originators simply as one way to derive a unique number for any serial item). However because PII is a simple (meaningless) identifier, it cannot be reverse engineered (i.e. meaning cannot be attributed to individual subsequences from the PII). This is clear if a publisher opts to use another convention to derive unique numbers, e.g. assigning the ninth and tenth characters as 01 for the first year of PII usage, 02 for the next and so on. In theory there will be an analogous problem after 99 years of usage of the PII, but it is assumed that by that time other solutions will be available.</p> <p><b>ISWC: International Standard Work Code</b></p> <p>The International Standard Work Code (ISWC) is a proposal made by CISAC to ISO in September 1996. The ISWC is currently defined and in use within CISAC for musical works, but is not a formal ISO standard. The proposal is to extend the scope of the CIS (Common Information System) to works such as articles and documents and formalise this as a standard related to other ISO standards such as ISBN, ISSN, ISMN, etc. ISWC is itself a simple identifier; it gains intelligence from its linkage to metadata held elsewhere in the CIS model such as an author (composer) database etc. [<a href="http://www.cisac.org/iswcfly.htm">http://www.cisac.org/iswcfly.htm</a>]</p> <p>As used currently (for musical works) each ISWC is made up of the letter "T" followed by nine digits and a check digit e.g. ISWC T-034.524.680-1. The components of the ISWC do not have meaning and the punctuation is for readability only. The proposal is to create ISWCs for other kinds of works with a different letter prefix - "L" for literary works and "S " for scientific works (definitions of which have not been given). L and S codes currently have no formal status other than as items under discussion by ISO.</p> <p>In May 1997 ISO began to consider this proposal as Work Item 15707: Information and documentation - International Standard Work Code (ISWC) within ISO TC 46/SC 9 and established a Working Group. Information is available on the ISO web site [<a href="http://www.nlc-bnc.ca/iso/tc46sc9/iswc.htm">http://www.nlc-bnc.ca/iso/tc46sc9/iswc.htm</a>]. The purpose of ISWC is <font face="WP TypographicSymbols">"</font>to provide a means of uniquely identifying intellectual properties, primarily for applications related to the administration of copyright and for use within computer databases and related documentation. The ISWC may be used in conjunction with existing international identification systems for published materials (e.g. ISBN, ISRC, etc.) but it is not intended to be an alternative nor a substitute for those identifiers<font face="WP TypographicSymbols">"</font>. The stated target date for final publication of an approved standard is April 2000, although attempts to speed up this timetable would be welcomed by all affected. </p> <p><b>SICI: Serial Item and Contribution Identifier</b></p> <p>Although approved in August 1996, the revised Serial Item and Contribution Identifier (SICI) was not published until April 1997 [ANSI/NISO Z39.56-1996 (Version 2) ISSN: 1041-5653]. The new availability in this standard of SICI mechanisms for non-paginated items (or for other identifier systems) in the CSI-3 format greatly enhances the usefulness of the SICI to the information industries.</p> <p>A complementary standard for book items using a similar methodology (BICI: Book Item and Contribution Identifier) was formally proposed in April 1997 and is under consideration by NISO for adoption as a standard. [<a href="http://www.bic.org.uk/bic/bici.html">http://www.bic.org.uk/bic/bici.html</a>]</p> <p>The use of SICIs in Internet-based systems may be complicated by issues of character transmission: the standard naming conventions for internet objects and resources exclude or restrict the use of some characters (e.g. URN syntax excludes angle brackets, square brackets, back slash). A typical SICI contains some of these, (e.g. 0015-6914(19950605)+<>1.0.TX;2-8). Although there are work-arounds to enable the transmission of such characters there may be a loss of transparency to the user.<b> </b>Issues such as these may well be dealt with as part of the DOI initiative which encounters the same problem.</p> <p><b>DOI: Digital Object Identifier</b></p> <p>The Association of American Publishers has designed a system for marking digital objects in order to facilitate electronic commerce and enable copyright management systems. That system, called the Digital Object Identifier System, is now under development, in partnership with the Corporation for National Research Initiatives (using the CNRI-developed Internet Handle technology), and is expected to be live on a limited scale in August, 1997. An internet web site is being maintained with complete and up to date information about that initiative and directions for further development of the DOI in the future [<a href="http://www.doi.org">http://www.doi.org</a>]. </p> <p>An extensive prototype system has been developed using data from five publishers which will be extended and demonstrated in Frankfurt in October 1997. Over 200,000 DOIs have been easily assigned by publishers participating in the prototype, and algorithms for automated DOI generation have been developed. Links to metadata (in Warwick Framework form) are under consideration; guidelines for creators, publishers and information providers have been drafted [<a href="http://www.handle.net/doi-prototype">http://www.handle.net/doi-prototype</a>]. </p> <p>A DOI will consist of two portions: a <i>prefix</i> or defining where to go for further information, and a <i>suffix</i> identifying a particular object. Viewed in this way, a DOI becomes a routing slip on the Internet carrying a ticket identifying a particular item at its destination. The DOI suffix will probably be (wholly or in part) an existing identifier rather than a new scheme; in practice DOI should be able to accommodate any scheme already in use, becoming interoperable with <font face="WP TypographicSymbols">"</font>legacy<font face="WP TypographicSymbols">"</font> systems. Thus the DOI suffix will not be a single format but any of a number of alternative suffixes including PII, SICI, ISWC, ISRC, etc.</p> <p>There are still a number of issues to be resolved, among which are:</p> <p>- DOI interoperability with as wide a range of existing identifier schemes as possible. Among these SICI is considered essential, yet the Handle technology is an application of the URN system; as mentioned earlier, current concept definitions of URNs do not allow use of some characters which are used in SICIs. Representatives of W3C/IETF have been involved in this issue, which it is now believed can be readily resolved.</p> <p>- The governance and commercial control of such a scheme.</p> <p>- The funding of an operational scheme: suggestions include creating a body which would recover costs from DOI directory or number usage.</p> <p>- The operational issues of such a scheme, such as numbering agencies, directory services, etc.; an agency which assigns a number and a directory manager which runs the routing system are separate functions, even if handled by the same organization.</p> <p>A recent development is the concept of an ISDI (<i>International Standard Document Identifier</i>) introduced by NISO at an informal working group convened in June 1997. This describes the <font face="WP TypographicSymbols">"</font>identification piece<font face="WP TypographicSymbols">"</font> (the suffix) of the proposed DOI system. (That meeting did not concern itself with the trading or registration aspects of the DOI initiative, the prefix). The <font face="WP TypographicSymbols">"</font>identification piece<font face="WP TypographicSymbols">"</font> has been referred to by NISO as ISDI as a generic descriptive term, not (as the name could imply) another standard: ISDI currently has no formal status as a standard or proposed standard. At the June 1997 meeting in Washington DC, a preliminary conclusion was that such an ISDI would need to carry at minimum the following:</p> <p>- an agency identifier (the agency/registry assigning or storing the object);</p> <p>- an identifier type (categories such as SICI, BICI, ISRC, etc.);</p> <p>- an indication of the name of the assigner of the identifier (i.e. the publisher);</p> <p>- the identifier itself;</p> <p>- a check digit (to be determined if this is needed).</p> <p>NISO has recommended that only ISDIs be used in the identification prefix of the DOI.</p> <p>It is not yet clear whether an ISDI is anything more than a description of the DOI suffix syntax, and if so who should be the prescriptive authority. Discussions are continuing between NISO and those involved in the DOI and other activities; at the time of writing there is no formal position statement on ISDI.</p> <p>DOI promises to bring together activities on internet routing of information (Uniform Resource addressing technology) and practical assignment by publishers of information identifiers (PII, SICI, etc) into a working model for publishers. </p> <p><b>STM activities</b></p> <p>STM and IPA have together convened an Information Identifiers Committee, chaired by Charles Ellis (Wiley), tasked with facilitating an international consensus within the publishing industry on a standard system (or systems) for identification and application of digital information objects. The committee includes a wide range of industry expertise, including individuals representing PII, DOI, SICI and ISWC activities. </p> <p>An initial statement [<a href="http://ww.ipa-uie.org/ipa_iic.html">http://ww.ipa-uie.org/ipa_iic.html</a>] was issued by the STM/IPA Committee in May 1997 supporting the concept of the DOI, encouraging IPA and STM members and other organizations to support and play an active role in its development. Further recommendations are expected following Frankfurt 1997.</p> <p><b>Uniform Resource addressing</b></p> <p>Internet technology is particularly relevant for electronic interchange of digital objects, as in the case of DOI. Work on extending the various definitions and standards for Uniform Resource addressing has recently been transferred from IETF to the W3C (World Wide Web consortium): [<a href="http://www.w3.org/pub/WWW/Addressing/Activity#as-h2-5794">http://www.w3.org/pub/WWW/Addressing/Activity#as-h2-5794</a>]</p> <p>Unfortunately there is still much confusion caused by careless use or misunderstanding of various addressing terms, summarised in table 1:</p> <p>Table 1: Uniform Resource Addressing</p> <table border="1" cellpadding="8" width="601" bordercolor="#000000"> <tr> <td width="33%"><p align="left"><font size="2">URI (Uniform Resource Identifier)</font></p> </td> <td width="67%"><p align="left"><font size="2">the generic set of all names/addresses that are short strings that refer to resources.</font></p> </td> </tr> <tr> <td width="33%"><p align="left"><font size="2">URL (Uniform Resource Locator)</font></p> </td> <td width="67%"><p align="left"><font size="2">the set of URI schemes that have explicit instructions on how to access the resource on the internet.</font></p> </td> </tr> <tr> <td width="33%"><p align="left"><font size="2">URN (Uniform Resource Name)</font></p> </td> <td width="67%"><p align="left"><font size="2">(1) a URI that has an institutional commitment to persistence, availability, etc.(may also be a URL e.g. PURL)</font></p> <p align="left"><font size="2">(2) A particular scheme which is currently under development in the W3C and IETF which should provide for the resolution using internet protocols of names which have a greater persistence than that currently associated with internet host names or organizations. When defined, a URN(2) will be an example of a URI. </font></p> </td> </tr> <tr> <td width="33%"><p align="left"><font size="2">URC (Uniform Resource Citation, or Uniform Resource Characteristics)</font></p> </td> <td width="67%"><p align="left"><font size="2">A set of attribute/value pairs describing a resource. Some of the values may be URIs of various kinds. Others may include, for example, authorship, publisher, datatype, date, copyright status and shoe size: a set of fields and values with some defined free formatting. </font></p> </td> </tr> </table> <p align="left"><font size="2"><i>Based on information from </i></font><a href="http://www.w3.org/pub/WWW/Addressing/Addressing.html"><font size="2"><i>http://www.w3.org/pub/WWW/Addressing/Addressing.html</i></font></a></p> <p align="left">An internet draft on <font face="WP TypographicSymbols">"</font>Using Existing Bibliographic Identifiers as Uniform Resource Names<font face="WP TypographicSymbols">"</font> was issued on 22 March 1997 for comment (Internet drafts expire in a six month period) which attempted to bring together the bibliographic standards and internet worlds [<a href="http://globecom.net/(nobg)/ietf/draft/draft-ietf-urn-biblio-00.shtml">http://globecom.net/(nobg)/ietf/draft/draft-ietf-urn-biblio-00.shtml</a>].</p> <p align="left">DOI uses CNRI<font face="WP TypographicSymbols">=</font>s <font face="WP TypographicSymbols">"</font>Handle<font face="WP TypographicSymbols">"</font> technology which is an application of a URN system. URNs are at present specified conceptually but not in final implemented form. The W3C web site describes the current situation and future work on internet addressing as follows: Unlike web data formats and protocols HTML and HTTP, there is only one web naming/addressing technology: URLs. URLs are stable, standard, and ubiquitous. But their popularity, combined with some design and implementation oversights, has led to overly fragile service and wasteful use of IP addresses. The wasteful use of IP addresses has been addressed by a new specification of the technical transfer protocol, HTTP 1.1, deployment of which W3C consider to be critical. Work in the W3C<i> Activity on SGML, XML, and Structured Document Interchange</i> seeks to establish mechanisms for addressing into structured documents in a general way. The URL specifications are in revision within the IETF. W3C are considering the issue of how much staff resource to commit to this effort. W3C are also investigating the use of metadata to enhance link robustness. </p> <p align="left"><b>Metadata activities</b></p> <p align="left">Information identifiers either contain or can point to supplementary information (<font face="WP TypographicSymbols">"</font>metadata<font face="WP TypographicSymbols">"</font>) enabling actions to be carried out; common agreement on what formats such metadata should follow will be essential. Prominent among such continuing activities are the <font face="WP TypographicSymbols">"</font>Dublin Core<font face="WP TypographicSymbols">"</font> (and its follow-up activities) and Internet developments for metadata coding such as MCF.</p> <p align="left">The Dublin Metadata workshop of March 1995 and the Warwick Metadata Workshop of April 1996 aimed to develop consensus on network resource description across a broad spectrum of stakeholders: the computer science community, text markup, and librarians among others. The result was the Dublin Core Metadata Element Set - a simple resource description record providing a foundation for electronic bibliographic description, improving structured access to information on the Internet and interoperability among disparate description models. The Dublin Core has now been updated and as of January 1997 specifies fifteen elements (table 2): currently many of the elements and their contents should be considered experimental. The Warwick Metadata Workshop follow-on activity produced a proposed syntax for the Dublin Core, the development of guidelines for applications, and the <font face="WP TypographicSymbols">"</font>Warwick Framework<font face="WP TypographicSymbols">"</font> to promote modular, separately accessible and maintainable packages of metadata. Thus, a Dublin Core package might be one of a number of other packages, including packages for terms and conditions, archiving and preservation, content ratings, and others. A third workshop (September, 1996: CNI/OCLC Image Metadata) addressed application of the Dublin Core to visual resources and resulted in minor changes to the original element set. The fourth and most recent workshop (Canberra, March 1997) addressed issues concerning deployment of the Dublin Core including extensibility, element structure, and element refinement.<i> Extensibility</i> refers to making DC a minimum set on which others may build additional elements; <i>element structure</i> refers to identification of default schemes and subelement conventions; <i>element refinement</i> refers to clearer definitions for certain of the elements (e.g. coverage, relation, and rights management). [<a href="http://www.oclc.org:5046/research/dublin_core/">http://www.oclc.org:5046/research/dublin_core/</a>]</p> <p align="left">Table 2: Dublin Core Element Descriptions (latest update, January 1997)<font size="2"><i> </i></font></p> <table border="1" cellpadding="8" width="601" bordercolor="#000000"> <tr> <td width="20%"><p align="left"><font size="2">TITLE </font></p> </td> <td width="80%"><p align="left"><font size="2">The name given to the resource by the CREATOR or PUBLISHER. </font></p> </td> </tr> <tr> <td width="20%"><p align="left"><font size="2">CREATOR</font></p> </td> <td width="80%"><p align="left"><font size="2">The person(s) or organization(s) primarily responsible for the intellectual content of the resource. For example, authors in the case of written documents.</font></p> </td> </tr> <tr> <td width="20%"><p align="left"><font size="2">SUBJECT</font></p> </td> <td width="80%"><p align="left"><font size="2">The topic of the resource, or keywords or phrases that describe the subject or content of the resource. The intent of the specification of this element is to promote the use of controlled vocabularies, keywords, classification data (e.g. Library of Congress Classification Numbers, Dewey Decimal numbers, MEdical Subject Headings)</font></p> </td> </tr> <tr> <td width="20%"><p align="left"><font size="2">DESCRIPTION</font></p> </td> <td width="80%"><p align="left"><font size="2">Text description of the content of the resource, including abstracts in the case of</font></p> <p align="left"><font size="2">document-like objects or content descriptions in the case of e.g. visual resources. Future metadata collections might include computational content description; this field might contain a link to such a description rather than the description itself.</font></p> </td> </tr> <tr> <td width="20%"><p align="left"><font size="2">PUBLISHER</font></p> </td> <td width="80%"><p align="left"><font size="2">The entity that provides access to the resource such as a publisher, a university department, or a corporate entity.</font></p> </td> </tr> <tr> <td width="20%"><p align="left"><font size="2">CONTRIBUTORS </font></p> </td> <td width="80%"><p align="left"><font size="2">Person(s) or organization(s) in addition to those specified in the CREATOR element who</font></p> <p align="left"><font size="2">have made significant intellectual contributions (e.g. editors, transcribers, illustrators, and convenors).</font></p> </td> </tr> <tr> <td width="20%"><p align="left"><font size="2">DATE </font></p> </td> <td width="80%"><p align="left"><font size="2">The date the resource was made available in its present form; recommended 8 digit number in the form YYYYMMDD</font>.</p> </td> </tr> <tr> <td width="20%"><p align="left"><font size="2">TYPE </font></p> </td> <td width="80%"><p align="left"><font size="2">Category of the resource, such as home page, novel, poem, working paper, preprint,</font></p> <p align="left"><font size="2">technical report, essay, dictionary. It is expected that this will be chosen from a specified list of types.</font></p> </td> </tr> <tr> <td width="20%"><p align="left"><font size="2">FORMAT </font></p> </td> <td width="80%"><p align="left"><font size="2">Data representation of the resource, such as text/html, ASCII, Postscript file, executable application, or JPEG image. In principal, formats can include physical media such as books, serials, or other non-electronic media. </font></p> </td> </tr> <tr> <td width="20%"><p align="left"><font size="2">IDENTIFIER </font></p> </td> <td width="80%"><p align="left"><font size="2">String or number used to uniquely identify the resource. Examples for networked resources include URLs and URNs (when implemented), other globally-unique identifiers such as ISBN, etc. </font></p> </td> </tr> <tr> <td width="20%"><p align="left"><font size="2">SOURCE </font></p> </td> <td width="80%"><p align="left"><font size="2">Work from which this resource is derived, if applicable.</font></p> </td> </tr> <tr> <td width="20%"><p align="left"><font size="2">LANGUAGE </font></p> </td> <td width="80%"><p align="left"><font size="2">Language(s) of the intellectual content of the resource.</font></p> </td> </tr> <tr> <td width="20%"><p align="left"><font size="2">RELATION </font></p> </td> <td width="80%"><p align="left"><font size="2">Relationship to other resources: a means to express relationships among resources that have formal relationships to others, but exist as discrete resources themselves. For example, images in a document, chapters in a book, or items in a collection.</font></p> </td> </tr> <tr> <td width="20%"><p align="left"><font size="2">COVERAGE </font></p> </td> <td width="80%"><p align="left"><font size="2">Spatial locations and temporal durations characteristic of the resource.</font></p> </td> </tr> <tr> <td width="20%"><p align="left"><font size="2">RIGHTS </font></p> </td> <td width="80%"><p align="left"><font size="2">Link to a copyright notice, rights-management statement, or server that would provide such information in a dynamic way.</font></p> </td> </tr> </table> <p align="left"><font size="2"><i>Adapted from </i></font><a href="http://purl.org/metadata/dublin_core_elements"><font size="2"><i>http://purl.org/metadata/dublin_core_elements</i></font></a></p> <p align="left">A<b> </b>Convention for Embedding Metadata in HTML (i.e. tagging of meta information in HTML) was proposed reflecting the consensus of a break-out group at a May 1996 W3C Distributed Indexing and Searching Workshop. This group included representatives of major players: the Dublin Core/Warwick Framework Metadata meetings, Lycos, Microsoft, WebCrawler, the IEEE metadata effort, Verity Software, and the W3C. Tagging in HTML would enable Internet exchange of such metadata. [<a href="http://www.oclc.org:5046/~weibel/html-meta.html">http://www.oclc.org:5046/~weibel/html-meta.html</a>]. Since then, proposals have been tabled in June 1997 to W3C by NetScape for an interchange format called Meta Content Framework (MCF), based on work initiated at Apple [<a href="http://mcf.research.apple.com/hs/mcf.html">http://mcf.research.apple.com/hs/mcf.html</a>] which provides a system for representing a wide range of information about content. MCF files contain descriptions of meta-content objects referred to as "units": a unit consists of a unit identifier (e.g. URL) and some number of predicates (<font face="WP TypographicSymbols">"</font>slots<font face="WP TypographicSymbols">"</font>). MCF is not intended to be an extension of markup languages such as HTML; it provides a format for holding the metadata externally. MCF should be able to represent the metadata that proposals such as the Dublin Core aim to cover. In this way, metadata would become available to Internet search engines, and in effect all sites that make use of MCF would have the ability to provide categorisations of their material: the inventor of MCF R.V. Guha has described the effect as <font face="WP TypographicSymbols">"</font>search engines on steroids<font face="WP TypographicSymbols">"</font>.</p> <p align="left"><b>Mark Up Languages</b></p> <p align="left">A document placed in an electronic environment should be identifiable, either by containing mark-up tags for elements such as <font face="WP TypographicSymbols">"</font>identifier<font face="WP TypographicSymbols">" </font>(explicitly stating the identifier); or alternatively, enable the identifier be generated implicitly from internal document information (<font face="WP TypographicSymbols">"</font>affordance<font face="WP TypographicSymbols">"</font>) which must therefore also be made available in a standard format. Documents should also be <font face="WP TypographicSymbols">"</font>open<font face="WP TypographicSymbols">"</font> or <font face="WP TypographicSymbols">"</font>interoperable<font face="WP TypographicSymbols">"</font>, i.e. readable (exchangeable) via any common software packages through a commonly agreed standard. Some developments in the past year with mark-up languages assist both of these aims: the release of a new version of the standard Internet mark-up, HTML (HyperText Markup Language); and the proposal for XML (Extended Markup Language) of particular interest to publishers already using SGML.</p> <p align="left">A major potential problem with Internet exchange of documents, especially for scientific material, is that the HTML standard used for mark-up (layout and formatting) is being outgrown by demands for complex document support; this has let to many extensions of HTML - around 90 exist, many of which are proprietary and supported only by certain software or browsers. This problem is being resolved in two different ways. One aims to widen the HTML standard to encompass known requirements; in July 1997, W3C released a draft of the latest HTML 4.0 intended to exploit new features without proprietary extensions, including greater control over forms, frames and tables, and all the benefits of scripts, style sheets and objects. Of interest to STM publishers, the feature of <font face="WP TypographicSymbols">"</font>Additional Named Entities<font face="WP TypographicSymbols">"</font> adds support for important symbols and glyphs used in mathematics, markup and internationalization. [<a href="http://www.w3.org/Press/HTML4">http://www.w3.org/Press/HTML4</a>]. The difficulty in this approach is that such a standard may never be complete. </p> <p align="left">An alternative response is represented by Extensible Markup Language (XML), a subset of SGML (Standard Generalized Markup Language) designed for delivery on the Web, proposed at SGML 96 (November 1996) and resulting in a W3C working draft proposal to the sixth WWW conference in April 1997. The XML approach is to provide a language which can make HTML self-extending in the true fashion of SGML, i.e. publishers can provide their own extensions and definitions akin to DTDs and define appropriate, readable, tags. XML could also provide a framework for Java language applets to work in. [<a href="http://www.w3.org/pub/WWW/TR/WD-xml.html">http://www.w3.org/pub/WWW/TR/WD-xml.html</a>]; [<a href="http://www.w3.org/pub/WWW/XML/Activity.html">http://www.w3.org/pub/WWW/XML/Activity.html</a>];[<i>Extensible MarkUp Language: SGML On-Ramp and Web Enabler. Tim Bray</i>, The Information Interchange Report, Vol 4 no 2/3 Nov/Dec 1996 pp1-6]</p> <p align="left">STM publishers are also interested in developments with mathematical mark-up; after more than a year of in-depth study and experimentation, the HTML Math working group released an updated working draft of MathML (Mathematical Mark-Up Language), a way of encoding both mathematical content and visual presentation, in July 1997. [<a href="http://www.w3.org/pub/WWW/TR/WD-math/">http://www.w3.org/pub/WWW/TR/WD-math/</a>]</p> <p align="left">The Document Object Model [<a href="http://www.w3.org/MarkUp/DOM/">http://www.w3.org/MarkUp/DOM/</a>] is a platform- and language-neutral interface that will allow programs and scripts to dynamically access and update the content, structure and style of documents ("Dynamic HTML" is a term used by some vendors to describe the combination of HTML, style sheets and scripts). The document can be further processed and the results of that processing can be incorporated back into the presented page. Requirements are being gathered for a first release of <font face="WP TypographicSymbols">"</font>level one<font face="WP TypographicSymbols">"</font> (functionality equivalent to that currently exposed in Netscape Navigator 3.0 and Microsoft Internet Explorer 3.0) in the second half of 1997. While of great interest in the long term, it seems unlikely that such interactive documents will be widely implemented in the STM world in the next year or so.</p> <p align="left"><b>The way forward</b></p> <p align="left">Internet standards are inescapably at the centre of likely future scenarios for our industry. The pace of development in this area leads to some conflict; for example, both HTML 4.0 and XML arise within W3C, yet the two are in tension, even to the extent that Tim Berners-Lee (W3C's Director) stated in July 1997:<i> </i><font face="WP TypographicSymbols"><i>A</i></font><i>"It's no wonder consumers, buyers and IT managers are concerned..... Extensible Markup Language (XML) naturally supports a variety of applications which could compromise the design of HTML</i><font face="WP TypographicSymbols"><i>"</i></font><i>.</i> [<a href="http://www.w3.org/Press/HTML4-pers.html">http://www.w3.org/Press/HTML4-pers.html</a>].</p> <p align="left">It is clear from recent activities such as MCF and other NetScape and Microsoft proposals to W3C that Internet standards (de facto or de jure) are now being heavily influenced by commercial technology players fighting to provide better access tools for internet and intranet applications in general (and by so doing to gain commercial advantage for their particular tools with a W3C imprimatur). Publishers will no doubt benefit from these activities but have little chance of influencing them. The World-Wide Web Consortium has so far not produced many actions of immediate specific concern to STM publishers; document identification, rights clearance mechanisms and so on appear to be taking a relatively minor position in its priorities compared to technical infrastructure issues and pressing matters such as <font face="WP TypographicSymbols">"</font>next generation<font face="WP TypographicSymbols">"</font> addressing protocols. All of this is understandable but also inevitable if one considers that most members of the W3C are technology companies; few are electronic publishers, and only one company (Reed-Elsevier) is a major publisher of both traditional paper and electronic information. We cannot expect that special cases such as STM material presentation, representing a tiny proportion of internet traffic, will receive any favoured treatment; we can however hope that the generation of sufficiently open standards and technology will enable STM material and transactions to be satisfactorily accomodated in future web standards. As W3C reaches the end of its first three-year funding and considers how to renew funding subscribers (and attract more) this emphasis may change (which suggests a possible action for those publishers interested in influencing such events).</p> <p align="left">STM publishers view the future scientific article as containing multimedia elements: <font face="Times New Roman">full text and abstract text; live </font><font face="WP TypographicSymbols">"</font><font face="Times New Roman">hot spot</font><font face="WP TypographicSymbols">" </font><font face="Times New Roman">references; video or audio clips; supplementary data tables; software linkages to e.g. 3-D models; links to other internet sites; forward links to comments, corrections, future papers, etc. How can identifiers and metadata assist us in developing such a rich system? The future digital object will need to take the following themes for a solution:</font></p> <p align="left"><font face="Times New Roman">- <i>Unique identification</i>: unambiguous identification of a defined piece of information, possibly with details of medium, version, format etc.;</font></p> <p align="left"><font face="Times New Roman">-<i> Multiple linkage</i>: by stating which naming convention is used, multiple naming or identification schemes should be possible (an idea adopted in SICI and DOI).</font></p> <p align="left"><font face="Times New Roman">- <i>Multiple (overlapping) identification</i> of content (e.g. a sound clip within a digital object may be identified by a music identifier as well as being part of a document with another identifier; the Dublin concept of relation may prove useful here); </font></p> <p align="left"><font face="Times New Roman">- <i>Arbitrary granularity</i>: if a publisher wants to identify a paragraph or equation as a separate item he can do so; </font></p> <p align="left"><font face="Times New Roman">- <i>Cascading responsibility</i>: once below a certain level, no central agency permission needed to assign unique numbers (sub-levels assigned by the owner of the higher level);</font></p> <p align="left"><font face="Times New Roman">- <i>Links to metadata</i>: via simple identifiers pointing to specific repositories for different needs, e.g. copyright, trading, EDI</font></p> <p align="left"><font face="Times New Roman">- <i>Open standards</i>: technical architecture interoperable with standard software packages, making use of W3C approved standards.</font></p> <p align="left"><font face="Times New Roman">- <i>Distributed data</i>: not all data and metadata held on one site; a virtual single network created from multiple interlinked servers.</font></p> <p align="left"><font face="Times New Roman">- </font><font face="WP TypographicSymbols">"</font><font face="Times New Roman"><i>Many but dumb</i></font><font face="WP TypographicSymbols">":</font><font face="Times New Roman"> a network of interconnected simple identifiers and links is preferable to a all-embracing single standard identifier which attempts to cover everything from a scientific article to a new music release.</font></p> <p align="left"><font face="Times New Roman">Once we have a recognised interoperable network in which to exchange information about digital information objects, we can begin to apply some of the emerging electronic commerce standards to carry out commercial transactions with them. </font>The demonstration of DOI at Frankfurt this year holds out the promise of one such workable system.</p> <p align="left"><b>Glossary of abbreviations used in this review</b></p> <table border="0"> <tr> <td>AAP</td> <td>Association of American Publishers</td> </tr> <tr> <td>ANSI</td> <td>American National Standards Institute</td> </tr> <tr> <td>ASCII</td> <td>7-bit American National Standard Code for Information Interchange, ANSI X3.4:1986</td> </tr> <tr> <td>BIC</td> <td>Book Industry Communication (UK organisation)</td> </tr> <tr> <td>BICI</td> <td>Book Item and Contribution Identifier (proposed NISO development)</td> </tr> <tr> <td>CIS</td> <td>Common Information System (CISAC)</td> </tr> <tr> <td>CISAC</td> <td>Confederation International des Societies d<font face="WP TypographicSymbols">=</font>Auteurs et Compositeurs = International confederation of societies of authors and composers</td> </tr> <tr> <td>DOI</td> <td>Digital Object Identifier (AAP)</td> </tr> <tr> <td>EC</td> <td>European Commission</td> </tr> <tr> <td>HTTP</td> <td>Hyper Text Transfer Protocol</td> </tr> <tr> <td>IETF</td> <td>Internet Engineering Task Force</td> </tr> <tr> <td>IFPI</td> <td>International Federation of Phonographic Industries (London)</td> </tr> <tr> <td>IPA</td> <td>International Publishers Association</td> </tr> <tr> <td>ISBN</td> <td>International Standard ISO 2108:1992 <br> Information and Documentation - International Standard Book Numbering (ISBN)</td> </tr> <tr> <td>ISDI</td> <td>International Standard Document Identifier (proposed term)</td> </tr> <tr> <td>ISI</td> <td>Institute of Scientific Information, Inc.</td> </tr> <tr> <td>ISMN</td> <td>International Standard ISO 10957:1993 <br> Information and Documentation - International Standard Music Number (ISMN)</td> </tr> <tr> <td>ISO</td> <td>International Organization for Standardization </td> </tr> <tr> <td>ISRC</td> <td>International Standard ISO 3901:1986 <br> Documentation<b> - </b>International Standard Recording Code (ISRC): administered by IFPI</td> </tr> <tr> <td>ISSN</td> <td>International Standard ISO 3297:1986 <br> Documentation - International Standard Serial Numbering (ISSN)<br> US equivalent: ANSI Z39.9:1979 (R1984)</td> </tr> <tr> <td>ISWC</td> <td>International Standard Work Code (currently proposed to ISO TC 46)</td> </tr> <tr> <td>NISO</td> <td>National Information Standards Organisation (USA)</td> </tr> <tr> <td>OCLC</td> <td>Online Computer Library Center Inc.</td> </tr> <tr> <td>PII</td> <td>Publisher Item Identifier</td> </tr> <tr> <td>STI</td> <td>Scientific, Technical and Information publishers<font face="WP TypographicSymbols">=</font> group (ACS, AIP, APS, IEEE, Elsevier Science)</td> </tr> <tr> <td>STM</td> <td>International Association of Scientific, Technical and Medical Publishers </td> </tr> <tr> <td>URC</td> <td>(1) Uniform Resource Citation (IETF)<br> (2) Uniform Resource Characteristic (IETF)</td> </tr> <tr> <td>URI</td> <td>Uniform Resource Identifier (IETF)</td> </tr> <tr> <td>URL</td> <td>Uniform Resource Locator (IETF)</td> </tr> <tr> <td>URN</td> <td>Uniform Resource Name (IETF)</td> </tr> <tr> <td>W3C</td> <td>World Wide Web Consortium</td> </tr> <tr> <td>XML</td> <td>Extensible Markup Language (subset of SGML)</td> </tr> </table> <hr> <p align="left"><b>Dr. Norman Paskin</b><br> Director, Information Technology Development<br> Elsevier Science<br> The Boulevard<br> Langford Lane<br> Kidlington<br> Oxford OX5 1GB, UK<i><br> Tel: (+44) (0) 1865 843798<br> Fax: (+44) (0) 1865 843967<br> E mail: </i><a href="mailto:n.paskin@elsevier.co.uk"><i>n.paskin@elsevier.co.uk</i></a></p> <hr> </body> </html> <p> Last update: 17 September 1997 <hr> <font size=-1>Mirror sites: <a href="http://www.elsevier.nl" target="_top">www.europe</a> | <a href="http://www.elsevier.com" target="_top">www.usa</a> | <a href="http://www.elsevier.co.jp" target="_top">www.japan</a></font> <br> © <a href = "/inca/homepage/about/c_right/">Copyright</a> 1997, Elsevier Science, All rights reserved.<br> <!-- To avoid double titles --> <img src=/inca/homepage/layout/images/blank.gif width=10 height=250> </body></html>
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]
Powered by eList eXpress LLC