docbook message

Subject: Re: DOCBOOK: Q: how to store articles DOI numbers?

From: Terry Allen <tallen@sonic.net>
To: docbook@lists.oasis-open.org
Date: Fri, 24 Mar 2000 09:34:56 -0800 (PST)

I attach an article on DOIs.  It appears that we need to
provide for multiple identifiers; we have ISBN and ISSN elements, but
a better solution might be:

<BiblioIdentifier type="doi">afdsafdjsakf;jdsa</>
<BiblioIdentifier type="isbn">fdjsakfjd;lakfjd</>

and we can add to the list of types upon request.  I feel sure we'll
see more types in the future.

regards, Terry

http://www.elsevier.co.jp/inca/homepage/about/diginfo/Menu.shtml
<html><head>
<title>General Information</title>
<style type="text/css">
<!--
    BODY { font-family: Times,Helvetica; }
    TD { font-family: Times,Helvetica; }
    DL { font-family: Times,Helvetica; }
    P  { font-family: Times,Helvetica; }
    UL { font-family: Times,Helvetica; }
    H1 { font-family: Times,Helvetica; }
    FORM { font-family: Times,Helvetica; }
    A { font-family: Times,Helvetica; text-decoration: none; }
-->
</style>

</head><body bgcolor="#ffffff">
<a name="Go to top">



<p><font size="5"><b>Digital Information Objects and the STM Publisher</b></font></p>



<p><i>Reproduced from STM Annual Report, 1997</i></p>



<p><b>Introduction</b></p>



<p>This review summarises activities during the past year (to

August 1997) of relevance to STM publishers in defining <i>standards

for identifying digital information objects</i>, and <i>applications

of such standards in electronic publishing</i>. Additional

background information is available in two other documents

published this year:</p>



<ul>

    <li>A brief introduction to the topic of identifiers,

        recently updated by the authors: <i>Unique Identifiers: a

        brief introduction, by Brian Green and Mark Bide. </i>[BIC;

        March 1997 <a

        href="http://www.bic.org.uk/bic/uniquid.html">http://www.bic.org.uk/bic/uniquid.html</a>]</li>

    <li>A more extensive review, expanded from the paper

        distributed earlier this year as an insert with STM

        Newsletter 101 and since published in both paper and

        electronic forms: this contains a full glossary of terms

        and detailed references: <i>Information Identifiers, by

        Norman Paskin</i>, [Learned Publishing, Vol. 10 No. 2, pp

        135-156 (April 1997); also available at <a

        href="/locate/infoident">http://www.elsevier.nl/locate/infoident</a>].</li>

</ul>



<p><b>Identifiers, document computing and electronic commerce</b></p>



<p>Information identifiers are of interest because of their

potential applications. A core concept is the distinction between

<i>&quot;simple&quot;</i> (&quot;dumb&quot;,

&quot;unintelligent&quot; or &quot;meaningless&quot;) identifiers

on the one hand, and &quot;compound&quot;

(&quot;intelligent&quot; or &quot;meaningful&quot;) identifiers

on the other. Simple identifiers are only a unique label for a

digital object; compound identifiers also contain other

information (<i>metadata</i>) which conveys some additional facts

such as location, format, owner, etc. Simple identifiers can also

be used to provide such information about the object they

identify, by using them to point to repositories of metadata.

These additional pieces of information about a digital object act

as hooks for other actions; in an electronic environment these

other actions typically include format and presentation

instructions (<i>document computing</i>) and rights and sales

transactions (<i>electronic commerce</i>).</p>



<p>Whilst there continues to be active discussion of simple

identifiers (in particular, PII and ISWC), much activity is

currently on potential compound identifiers (DOI, URNs, etc.).

The requirements imposed on a compound identifier for storing

metadata have consequences for the identifier itself: a complete

understanding of the topic of identifiers therefore takes us into

areas of mark-up, multimedia rights clearance systems, and

electronic commerce. </p>



<p><i>Mark-up </i>developments are briefly covered here only in

the context of relevance to identifiers. <i>Multimedia rights

clearance systems</i> are the subject of a number of initiatives,

including EC schemes such as Imprimatur [<a

href="http://www.imprimatur.alcs.co.uk/expert.htm">http://www.imprimatur.alcs.co.uk/expert.htm</a>]

and recently the EC MMRCS project within Info 2000 managed by

PIRA. [<a href="http://www2.echo.lu/info2000/en/infowkpg.html">http://www2.echo.lu/info2000/en/infowkpg.html</a>];

they will not be discussed here. </p>



<p><i>Electronic commerce systems</i> are likely to be determined

by banks and other institutions; publishers need not become

involved in their development but will wish to use proven

systems. A frontrunner is the VISA/MasterCard SET (Secure

Electronic Transaction) proposal of 1996 [<a

href="http://www.rsa.com/set/">http://www.rsa.com/set/</a>]),

which aims to have system availability in 40 countries by the end

of 1997, although this now looks optimistic as considerable

problems (due to set-up complexity and transaction times) were

reported in July 1997 by a number of banks currently trialling

SET 1.0. The World Wide Web Consortium (W3C) activity Joint

Electronic Payment Initiative (JEPI) has now been down scaled to

an Interest Group For Electronic Commerce, which had a first

meeting in April 97 and is currently awaiting member input

regarding next steps for a meeting in September 1997. [<a

href="http://www.w3.org/Payments/Activity">http://www.w3.org/Payments/Activity</a>]

</p>



<p><b>PII: Publisher Item Identifier</b></p>



<p>PII, introduced in 1995 [<a

href="/inca/homepage/about/pii/">http://www.elsevier.nl/inca/homepage/about/pii/</a>]

by the STI group of publishers, remains in active use by

publishers participating in its origination and others (e.g.

American Mathematical Society). Amongst related information

users, ISI are actively considering the use of PII in their

abstracting and indexing services. Several publishers adopting

PII have stated that they intend to use PII as the

publisher-assigned portion of future potential schemes such as

DOI. PII provides an easy to use simple identifier which can be

integrated into compound identifiers, and has the advantage of an

ASCII alphanumeric character syntax (e.g. S016538069600403) which

poses no problems for exchange protocols or naming conventions.</p>



<p>It is worth recapping why those publishers who originated the

PII continue to actively use and support it. The PII originators

required an identifier that was short enough to be useful in

document ordering; the version 1 of SICI that was in effect at

the time PII was established was grounded in print (page number

etc.) whereas something was needed which worked for electronic

information; and the latest SICI, DOI and URN developments had

not been formally initiated (arguably PII activities spurred them

on, as intended by PII participants). PII remains an effective

and easy to implement simple identifier for use within a

publishers system or for exchange between defined parties; it

also provides a very good basis for integration into the compound

identifiers and systems now being considered for usages such as

rights control and electronic commerce. The PII originators are

currently considering whether extensions to PII to allow for

specification of components to an arbitrary level of granularity

would be a useful recommendation, and if so how this might be

accomplished.</p>



<p>The question has been raised of whether the Year 2000

compliance issue (Y2K or millennium problem of computer data

systems) has any consequences for PII: it does not. A date cannot

be derived from a PII so the Y2K issue is irrelevant. PII, when

used to identify a serial item, may contain as its ninth and

tenth numerical characters two digits derived from the year of

publication (a recommendation made by the PII originators simply

as one way to derive a unique number for any serial item).

However because PII is a simple (meaningless) identifier, it

cannot be reverse engineered (i.e. meaning cannot be attributed

to individual subsequences from the PII). This is clear if a

publisher opts to use another convention to derive unique

numbers, e.g. assigning the ninth and tenth characters as 01 for

the first year of PII usage, 02 for the next and so on. In theory

there will be an analogous problem after 99 years of usage of the

PII, but it is assumed that by that time other solutions will be

available.</p>



<p><b>ISWC: International Standard Work Code</b></p>



<p>The International Standard Work Code (ISWC) is a proposal made

by CISAC to ISO in September 1996. The ISWC is currently defined

and in use within CISAC for musical works, but is not a formal

ISO standard. The proposal is to extend the scope of the CIS

(Common Information System) to works such as articles and

documents and formalise this as a standard related to other ISO

standards such as ISBN, ISSN, ISMN, etc. ISWC is itself a simple

identifier; it gains intelligence from its linkage to metadata

held elsewhere in the CIS model such as an author (composer)

database etc. [<a href="http://www.cisac.org/iswcfly.htm">http://www.cisac.org/iswcfly.htm</a>]</p>



<p>As used currently (for musical works) each ISWC is made up of

the letter &quot;T&quot; followed by nine digits and a check

digit e.g. ISWC T-034.524.680-1. The components of the ISWC do

not have meaning and the punctuation is for readability only. The

proposal is to create ISWCs for other kinds of works with a

different letter prefix - &quot;L&quot; for literary works and &quot;S

&quot; for scientific works

(definitions of which have not been given). L and S codes

currently have no formal status other than as items under

discussion by ISO.</p>



<p>In May 1997 ISO began to consider this proposal as Work Item

15707: Information and documentation - International Standard

Work Code (ISWC) within ISO TC 46/SC 9 and established a Working

Group. Information is available on the ISO web site [<a

href="http://www.nlc-bnc.ca/iso/tc46sc9/iswc.htm">http://www.nlc-bnc.ca/iso/tc46sc9/iswc.htm</a>].

The purpose of ISWC is <font face="WP TypographicSymbols">&quot;</font>to

provide a means of uniquely identifying intellectual properties,

primarily for applications related to the administration of

copyright and for use within computer databases and related

documentation. The ISWC may be used in conjunction with existing

international identification systems for published materials

(e.g. ISBN, ISRC, etc.) but it is not intended to be an

alternative nor a substitute for those identifiers<font

face="WP TypographicSymbols">&quot;</font>. The stated target

date for final publication of an approved standard is April 2000,

although attempts to speed up this timetable would be welcomed by

all affected. </p>



<p><b>SICI: Serial Item and Contribution Identifier</b></p>



<p>Although approved in August 1996, the revised Serial Item and

Contribution Identifier (SICI) was not published until April 1997

[ANSI/NISO Z39.56-1996 (Version 2) ISSN: 1041-5653]. The new

availability in this standard of SICI mechanisms for

non-paginated items (or for other identifier systems) in the

CSI-3 format greatly enhances the usefulness of the SICI to the

information industries.</p>



<p>A complementary standard for book items using a similar

methodology (BICI: Book Item and Contribution Identifier) was

formally proposed in April 1997 and is under consideration by

NISO for adoption as a standard. [<a

href="http://www.bic.org.uk/bic/bici.html">http://www.bic.org.uk/bic/bici.html</a>]</p>



<p>The use of SICIs in Internet-based systems may be complicated

by issues of character transmission: the standard naming

conventions for internet objects and resources exclude or

restrict the use of some characters (e.g. URN syntax excludes

angle brackets, square brackets, back slash). A typical SICI

contains some of these, (e.g.

0015-6914(19950605)+&lt;&gt;1.0.TX;2-8). Although there are

work-arounds to enable the transmission of such characters there

may be a loss of transparency to the user.<b> </b>Issues such as

these may well be dealt with as part of the DOI initiative which

encounters the same problem.</p>



<p><b>DOI: Digital Object Identifier</b></p>



<p>The Association of American Publishers has designed a system

for marking digital objects in order to facilitate electronic

commerce and enable copyright management systems. That system,

called the Digital Object Identifier System, is now under

development, in partnership with the Corporation for National

Research Initiatives (using the CNRI-developed Internet Handle

technology), and is expected to be live on a limited scale in

August, 1997. An internet web site is being maintained with

complete and up to date information about that initiative and

directions for further development of the DOI in the future [<a

href="http://www.doi.org">http://www.doi.org</a>]. </p>



<p>An extensive prototype system has been developed using data

from five publishers which will be extended and demonstrated in

Frankfurt in October 1997. Over 200,000 DOIs have been easily

assigned by publishers participating in the prototype, and

algorithms for automated DOI generation have been developed.

Links to metadata (in Warwick Framework form) are under

consideration; guidelines for creators, publishers and

information providers have been drafted [<a

href="http://www.handle.net/doi-prototype">http://www.handle.net/doi-prototype</a>].

</p>



<p>A DOI will consist of two portions: a <i>prefix</i> or

defining where to go for further information, and a <i>suffix</i>

identifying a particular object. Viewed in this way, a DOI

becomes a routing slip on the Internet carrying a ticket

identifying a particular item at its destination. The DOI suffix

will probably be (wholly or in part) an existing identifier

rather than a new scheme; in practice DOI should be able to

accommodate any scheme already in use, becoming interoperable

with <font face="WP TypographicSymbols">&quot;</font>legacy<font

face="WP TypographicSymbols">&quot;</font> systems. Thus the DOI

suffix will not be a single format but any of a number of

alternative suffixes including PII, SICI, ISWC, ISRC, etc.</p>



<p>There are still a number of issues to be resolved, among which

are:</p>



<p>- DOI interoperability with as wide a range of existing

identifier schemes as possible. Among these SICI is considered

essential, yet the Handle technology is an application of the URN

system; as mentioned earlier, current concept definitions of URNs

do not allow use of some characters which are used in SICIs.

Representatives of W3C/IETF have been involved in this issue,

which it is now believed can be readily resolved.</p>



<p>- The governance and commercial control of such a scheme.</p>



<p>- The funding of an operational scheme: suggestions include

creating a body which would recover costs from DOI directory or

number usage.</p>



<p>- The operational issues of such a scheme, such as numbering

agencies, directory services, etc.; an agency which assigns a

number and a directory manager which runs the routing system are

separate functions, even if handled by the same organization.</p>



<p>A recent development is the concept of an ISDI (<i>International

Standard Document Identifier</i>) introduced by NISO at an

informal working group convened in June 1997. This describes the <font

face="WP TypographicSymbols">&quot;</font>identification piece<font

face="WP TypographicSymbols">&quot;</font> (the suffix) of the

proposed DOI system. (That meeting did not concern itself with

the trading or registration aspects of the DOI initiative, the

prefix). The <font face="WP TypographicSymbols">&quot;</font>identification

piece<font face="WP TypographicSymbols">&quot;</font> has been

referred to by NISO as ISDI as a generic descriptive term, not

(as the name could imply) another standard: ISDI currently has no

formal status as a standard or proposed standard. At the June

1997 meeting in Washington DC, a preliminary conclusion was that

such an ISDI would need to carry at minimum the following:</p>



<p>- an agency identifier (the agency/registry assigning or

storing the object);</p>



<p>- an identifier type (categories such as SICI, BICI, ISRC,

etc.);</p>



<p>- an indication of the name of the assigner of the identifier

(i.e. the publisher);</p>



<p>- the identifier itself;</p>



<p>- a check digit (to be determined if this is needed).</p>



<p>NISO has recommended that only ISDIs be used in the

identification prefix of the DOI.</p>



<p>It is not yet clear whether an ISDI is anything more than a

description of the DOI suffix syntax, and if so who should be the

prescriptive authority. Discussions are continuing between NISO

and those involved in the DOI and other activities; at the time

of writing there is no formal position statement on ISDI.</p>



<p>DOI promises to bring together activities on internet routing

of information (Uniform Resource addressing technology) and

practical assignment by publishers of information identifiers

(PII, SICI, etc) into a working model for publishers. </p>



<p><b>STM activities</b></p>



<p>STM and IPA have together convened an Information Identifiers

Committee, chaired by Charles Ellis (Wiley), tasked with

facilitating an international consensus within the publishing

industry on a standard system (or systems) for identification and

application of digital information objects. The committee

includes a wide range of industry expertise, including

individuals representing PII, DOI, SICI and ISWC activities. </p>



<p>An initial statement [<a

href="http://ww.ipa-uie.org/ipa_iic.html">http://ww.ipa-uie.org/ipa_iic.html</a>]

was issued by the STM/IPA Committee in May 1997 supporting the

concept of the DOI, encouraging IPA and STM members and other

organizations to support and play an active role in its

development. Further recommendations are expected following

Frankfurt 1997.</p>



<p><b>Uniform Resource addressing</b></p>



<p>Internet technology is particularly relevant for electronic

interchange of digital objects, as in the case of DOI. Work on

extending the various definitions and standards for Uniform

Resource addressing has recently been transferred from IETF to

the W3C (World Wide Web consortium): [<a

href="http://www.w3.org/pub/WWW/Addressing/Activity#as-h2-5794">http://www.w3.org/pub/WWW/Addressing/Activity#as-h2-5794</a>]</p>



<p>Unfortunately there is still much confusion caused by careless

use or misunderstanding of various addressing terms, summarised

in table 1:</p>



<p>Table 1: Uniform Resource Addressing</p>



<table border="1" cellpadding="8" width="601"

bordercolor="#000000">

    <tr>

        <td width="33%"><p align="left"><font size="2">URI

        (Uniform Resource Identifier)</font></p>

        </td>

        <td width="67%"><p align="left"><font size="2">the

        generic set of all names/addresses that are short strings

        that refer to resources.</font></p>

        </td>

    </tr>

    <tr>

        <td width="33%"><p align="left"><font size="2">URL

        (Uniform Resource Locator)</font></p>

        </td>

        <td width="67%"><p align="left"><font size="2">the set of

        URI schemes that have explicit instructions on how to

        access the resource on the internet.</font></p>

        </td>

    </tr>

    <tr>

        <td width="33%"><p align="left"><font size="2">URN

        (Uniform Resource Name)</font></p>

        </td>

        <td width="67%"><p align="left"><font size="2">(1) a URI

        that has an institutional commitment to persistence,

        availability, etc.(may also be a URL e.g. PURL)</font></p>

        <p align="left"><font size="2">(2) A particular scheme

        which is currently under development in the W3C and IETF

        which should provide for the resolution using internet

        protocols of names which have a greater persistence than

        that currently associated with internet host names or

        organizations. When defined, a URN(2) will be an example

        of a URI. </font></p>

        </td>

    </tr>

    <tr>

        <td width="33%"><p align="left"><font size="2">URC

        (Uniform Resource Citation, or Uniform Resource

        Characteristics)</font></p>

        </td>

        <td width="67%"><p align="left"><font size="2">A set of

        attribute/value pairs describing a resource. Some of the

        values may be URIs of various kinds. Others may include,

        for example, authorship, publisher, datatype, date,

        copyright status and shoe size: a set of fields and

        values with some defined free formatting. </font></p>

        </td>

    </tr>

</table>



<p align="left"><font size="2"><i>Based on information from </i></font><a

href="http://www.w3.org/pub/WWW/Addressing/Addressing.html"><font

size="2"><i>http://www.w3.org/pub/WWW/Addressing/Addressing.html</i></font></a></p>



<p align="left">An internet draft on <font

face="WP TypographicSymbols">&quot;</font>Using Existing

Bibliographic Identifiers as Uniform Resource Names<font

face="WP TypographicSymbols">&quot;</font> was issued on 22 March

1997 for comment (Internet drafts expire in a six month period)

which attempted to bring together the bibliographic standards and

internet worlds [<a

href="http://globecom.net/(nobg)/ietf/draft/draft-ietf-urn-biblio-00.shtml">http://globecom.net/(nobg)/ietf/draft/draft-ietf-urn-biblio-00.shtml</a>].</p>



<p align="left">DOI uses CNRI<font face="WP TypographicSymbols">=</font>s

<font face="WP TypographicSymbols">&quot;</font>Handle<font

face="WP TypographicSymbols">&quot;</font> technology which is an

application of a URN system. URNs are at present specified

conceptually but not in final implemented form. The W3C web site

describes the current situation and future work on internet

addressing as follows: Unlike web data formats and protocols HTML

and HTTP, there is only one web naming/addressing technology:

URLs. URLs are stable, standard, and ubiquitous. But their

popularity, combined with some design and implementation

oversights, has led to overly fragile service and wasteful use of

IP addresses. The wasteful use of IP addresses has been addressed

by a new specification of the technical transfer protocol, HTTP

1.1, deployment of which W3C consider to be critical. Work in the

W3C<i> Activity on SGML, XML, and Structured Document Interchange</i>

seeks to establish mechanisms for addressing into structured

documents in a general way. The URL specifications are in

revision within the IETF. W3C are considering the issue of how

much staff resource to commit to this effort. W3C are also

investigating the use of metadata to enhance link robustness. </p>



<p align="left"><b>Metadata activities</b></p>



<p align="left">Information identifiers either contain or can

point to supplementary information (<font

face="WP TypographicSymbols">&quot;</font>metadata<font

face="WP TypographicSymbols">&quot;</font>) enabling actions to

be carried out; common agreement on what formats such metadata

should follow will be essential. Prominent among such continuing

activities are the <font face="WP TypographicSymbols">&quot;</font>Dublin

Core<font face="WP TypographicSymbols">&quot;</font> (and its

follow-up activities) and Internet developments for metadata

coding such as MCF.</p>



<p align="left">The Dublin Metadata workshop of March 1995 and

the Warwick Metadata Workshop of April 1996 aimed to develop

consensus on network resource description across a broad spectrum

of stakeholders: the computer science community, text markup, and

librarians among others. The result was the Dublin Core Metadata

Element Set - a simple resource description record providing a

foundation for electronic bibliographic description, improving

structured access to information on the Internet and

interoperability among disparate description models. The Dublin

Core has now been updated and as of January 1997 specifies

fifteen elements (table 2): currently many of the elements and

their contents should be considered experimental. The Warwick

Metadata Workshop follow-on activity produced a proposed syntax

for the Dublin Core, the development of guidelines for

applications, and the <font face="WP TypographicSymbols">&quot;</font>Warwick

Framework<font face="WP TypographicSymbols">&quot;</font> to

promote modular, separately accessible and maintainable packages

of metadata. Thus, a Dublin Core package might be one of a number

of other packages, including packages for terms and conditions,

archiving and preservation, content ratings, and others. A third

workshop (September, 1996: CNI/OCLC Image Metadata) addressed

application of the Dublin Core to visual resources and resulted

in minor changes to the original element set. The fourth and most

recent workshop (Canberra, March 1997) addressed issues

concerning deployment of the Dublin Core including extensibility,

element structure, and element refinement.<i> Extensibility</i>

refers to making DC a minimum set on which others may build

additional elements; <i>element structure</i> refers to

identification of default schemes and subelement conventions; <i>element

refinement</i> refers to clearer definitions for certain of the

elements (e.g. coverage, relation, and rights management). [<a

href="http://www.oclc.org:5046/research/dublin_core/">http://www.oclc.org:5046/research/dublin_core/</a>]</p>



<p align="left">Table 2: Dublin Core Element Descriptions (latest

update, January 1997)<font size="2"><i> </i></font></p>



<table border="1" cellpadding="8" width="601"

bordercolor="#000000">

    <tr>

        <td width="20%"><p align="left"><font size="2">TITLE </font></p>

        </td>

        <td width="80%"><p align="left"><font size="2">The name

        given to the resource by the CREATOR or PUBLISHER. </font></p>

        </td>

    </tr>

    <tr>

        <td width="20%"><p align="left"><font size="2">CREATOR</font></p>

        </td>

        <td width="80%"><p align="left"><font size="2">The

        person(s) or organization(s) primarily responsible for

        the intellectual content of the resource. For example,

        authors in the case of written documents.</font></p>

        </td>

    </tr>

    <tr>

        <td width="20%"><p align="left"><font size="2">SUBJECT</font></p>

        </td>

        <td width="80%"><p align="left"><font size="2">The topic

        of the resource, or keywords or phrases that describe the

        subject or content of the resource. The intent of the

        specification of this element is to promote the use of

        controlled vocabularies, keywords, classification data

        (e.g. Library of Congress Classification Numbers, Dewey

        Decimal numbers, MEdical Subject Headings)</font></p>

        </td>

    </tr>

    <tr>

        <td width="20%"><p align="left"><font size="2">DESCRIPTION</font></p>

        </td>

        <td width="80%"><p align="left"><font size="2">Text

        description of the content of the resource, including

        abstracts in the case of</font></p>

        <p align="left"><font size="2">document-like objects or

        content descriptions in the case of e.g. visual

        resources. Future metadata collections might include

        computational content description; this field might

        contain a link to such a description rather than the

        description itself.</font></p>

        </td>

    </tr>

    <tr>

        <td width="20%"><p align="left"><font size="2">PUBLISHER</font></p>

        </td>

        <td width="80%"><p align="left"><font size="2">The entity

        that provides access to the resource such as a publisher,

        a university department, or a corporate entity.</font></p>

        </td>

    </tr>

    <tr>

        <td width="20%"><p align="left"><font size="2">CONTRIBUTORS

        </font></p>

        </td>

        <td width="80%"><p align="left"><font size="2">Person(s)

        or organization(s) in addition to those specified in the

        CREATOR element who</font></p>

        <p align="left"><font size="2">have made significant

        intellectual contributions (e.g. editors, transcribers,

        illustrators, and convenors).</font></p>

        </td>

    </tr>

    <tr>

        <td width="20%"><p align="left"><font size="2">DATE </font></p>

        </td>

        <td width="80%"><p align="left"><font size="2">The date

        the resource was made available in its present form;

        recommended 8 digit number in the form YYYYMMDD</font>.</p>

        </td>

    </tr>

    <tr>

        <td width="20%"><p align="left"><font size="2">TYPE </font></p>

        </td>

        <td width="80%"><p align="left"><font size="2">Category

        of the resource, such as home page, novel, poem, working

        paper, preprint,</font></p>

        <p align="left"><font size="2">technical report, essay,

        dictionary. It is expected that this will be chosen from

        a specified list of types.</font></p>

        </td>

    </tr>

    <tr>

        <td width="20%"><p align="left"><font size="2">FORMAT </font></p>

        </td>

        <td width="80%"><p align="left"><font size="2">Data

        representation of the resource, such as text/html, ASCII,

        Postscript file, executable application, or JPEG image.

        In principal, formats can include physical media such as

        books, serials, or other non-electronic media. </font></p>

        </td>

    </tr>

    <tr>

        <td width="20%"><p align="left"><font size="2">IDENTIFIER

        </font></p>

        </td>

        <td width="80%"><p align="left"><font size="2">String or

        number used to uniquely identify the resource. Examples

        for networked resources include URLs and URNs (when

        implemented), other globally-unique identifiers such as

        ISBN, etc. </font></p>

        </td>

    </tr>

    <tr>

        <td width="20%"><p align="left"><font size="2">SOURCE </font></p>

        </td>

        <td width="80%"><p align="left"><font size="2">Work from

        which this resource is derived, if applicable.</font></p>

        </td>

    </tr>

    <tr>

        <td width="20%"><p align="left"><font size="2">LANGUAGE </font></p>

        </td>

        <td width="80%"><p align="left"><font size="2">Language(s)

        of the intellectual content of the resource.</font></p>

        </td>

    </tr>

    <tr>

        <td width="20%"><p align="left"><font size="2">RELATION </font></p>

        </td>

        <td width="80%"><p align="left"><font size="2">Relationship

        to other resources: a means to express relationships

        among resources that have formal relationships to others,

        but exist as discrete resources themselves. For example,

        images in a document, chapters in a book, or items in a

        collection.</font></p>

        </td>

    </tr>

    <tr>

        <td width="20%"><p align="left"><font size="2">COVERAGE </font></p>

        </td>

        <td width="80%"><p align="left"><font size="2">Spatial

        locations and temporal durations characteristic of the

        resource.</font></p>

        </td>

    </tr>

    <tr>

        <td width="20%"><p align="left"><font size="2">RIGHTS </font></p>

        </td>

        <td width="80%"><p align="left"><font size="2">Link to a

        copyright notice, rights-management statement, or server

        that would provide such information in a dynamic way.</font></p>

        </td>

    </tr>

</table>



<p align="left"><font size="2"><i>Adapted from </i></font><a

href="http://purl.org/metadata/dublin_core_elements"><font

size="2"><i>http://purl.org/metadata/dublin_core_elements</i></font></a></p>



<p align="left">A<b> </b>Convention for Embedding Metadata in

HTML (i.e. tagging of meta information in HTML) was proposed

reflecting the consensus of a break-out group at a May 1996 W3C

Distributed Indexing and Searching Workshop. This group included

representatives of major players: the Dublin Core/Warwick

Framework Metadata meetings, Lycos, Microsoft, WebCrawler, the

IEEE metadata effort, Verity Software, and the W3C. Tagging in

HTML would enable Internet exchange of such metadata. [<a

href="http://www.oclc.org:5046/~weibel/html-meta.html">http://www.oclc.org:5046/~weibel/html-meta.html</a>].

Since then, proposals have been tabled in June 1997 to W3C by

NetScape for an interchange format called Meta Content Framework

(MCF), based on work initiated at Apple [<a

href="http://mcf.research.apple.com/hs/mcf.html">http://mcf.research.apple.com/hs/mcf.html</a>]

which provides a system for representing a wide range of

information about content. MCF files contain descriptions of

meta-content objects referred to as &quot;units&quot;: a unit

consists of a unit identifier (e.g. URL) and some number of

predicates (<font face="WP TypographicSymbols">&quot;</font>slots<font

face="WP TypographicSymbols">&quot;</font>). MCF is not intended

to be an extension of markup languages such as HTML; it provides

a format for holding the metadata externally. MCF should be able

to represent the metadata that proposals such as the Dublin Core

aim to cover. In this way, metadata would become available to

Internet search engines, and in effect all sites that make use of

MCF would have the ability to provide categorisations of their

material: the inventor of MCF R.V. Guha has described the effect

as <font face="WP TypographicSymbols">&quot;</font>search engines

on steroids<font face="WP TypographicSymbols">&quot;</font>.</p>



<p align="left"><b>Mark Up Languages</b></p>



<p align="left">A document placed in an electronic environment

should be identifiable, either by containing mark-up tags for

elements such as <font face="WP TypographicSymbols">&quot;</font>identifier<font

face="WP TypographicSymbols">&quot; </font>(explicitly stating

the identifier); or alternatively, enable the identifier be

generated implicitly from internal document information (<font

face="WP TypographicSymbols">&quot;</font>affordance<font

face="WP TypographicSymbols">&quot;</font>) which must therefore

also be made available in a standard format. Documents should

also be <font face="WP TypographicSymbols">&quot;</font>open<font

face="WP TypographicSymbols">&quot;</font> or <font

face="WP TypographicSymbols">&quot;</font>interoperable<font

face="WP TypographicSymbols">&quot;</font>, i.e. readable

(exchangeable) via any common software packages through a

commonly agreed standard. Some developments in the past year with

mark-up languages assist both of these aims: the release of a new

version of the standard Internet mark-up, HTML (HyperText Markup

Language); and the proposal for XML (Extended Markup Language) of

particular interest to publishers already using SGML.</p>



<p align="left">A major potential problem with Internet exchange

of documents, especially for scientific material, is that the

HTML standard used for mark-up (layout and formatting) is being

outgrown by demands for complex document support; this has let to

many extensions of HTML - around 90 exist, many of which are

proprietary and supported only by certain software or browsers.

This problem is being resolved in two different ways. One aims to

widen the HTML standard to encompass known requirements; in July

1997, W3C released a draft of the latest HTML 4.0 intended to

exploit new features without proprietary extensions, including

greater control over forms, frames and tables, and all the

benefits of scripts, style sheets and objects. Of interest to STM

publishers, the feature of <font face="WP TypographicSymbols">&quot;</font>Additional

Named Entities<font face="WP TypographicSymbols">&quot;</font>

adds support for important symbols and glyphs used in

mathematics, markup and internationalization. [<a

href="http://www.w3.org/Press/HTML4">http://www.w3.org/Press/HTML4</a>].

The difficulty in this approach is that such a standard may never

be complete. </p>



<p align="left">An alternative response is represented by

Extensible Markup Language (XML), a subset of SGML (Standard

Generalized Markup Language) designed for delivery on the Web,

proposed at SGML 96 (November 1996) and resulting in a W3C

working draft proposal to the sixth WWW conference in April 1997.

The XML approach is to provide a language which can make HTML

self-extending in the true fashion of SGML, i.e. publishers can

provide their own extensions and definitions akin to DTDs and

define appropriate, readable, tags. XML could also provide a

framework for Java language applets to work in. [<a

href="http://www.w3.org/pub/WWW/TR/WD-xml.html">http://www.w3.org/pub/WWW/TR/WD-xml.html</a>];

[<a href="http://www.w3.org/pub/WWW/XML/Activity.html">http://www.w3.org/pub/WWW/XML/Activity.html</a>];[<i>Extensible

MarkUp Language: SGML On-Ramp and Web Enabler. Tim Bray</i>, The

Information Interchange Report, Vol 4 no 2/3 Nov/Dec 1996 pp1-6]</p>



<p align="left">STM publishers are also interested in

developments with mathematical mark-up; after more than a year of

in-depth study and experimentation, the HTML Math working group

released an updated working draft of MathML (Mathematical Mark-Up

Language), a way of encoding both mathematical content and visual

presentation, in July 1997. [<a

href="http://www.w3.org/pub/WWW/TR/WD-math/">http://www.w3.org/pub/WWW/TR/WD-math/</a>]</p>



<p align="left">The Document Object Model [<a

href="http://www.w3.org/MarkUp/DOM/">http://www.w3.org/MarkUp/DOM/</a>]

is a platform- and language-neutral interface that will allow

programs and scripts to dynamically access and update the

content, structure and style of documents (&quot;Dynamic

HTML&quot; is a term used by some vendors to describe the

combination of HTML, style sheets and scripts). The document can

be further processed and the results of that processing can be

incorporated back into the presented page. Requirements are being

gathered for a first release of <font

face="WP TypographicSymbols">&quot;</font>level one<font

face="WP TypographicSymbols">&quot;</font> (functionality

equivalent to that currently exposed in Netscape Navigator 3.0

and Microsoft Internet Explorer 3.0) in the second half of 1997.

While of great interest in the long term, it seems unlikely that

such interactive documents will be widely implemented in the STM

world in the next year or so.</p>



<p align="left"><b>The way forward</b></p>



<p align="left">Internet standards are inescapably at the centre

of likely future scenarios for our industry. The pace of

development in this area leads to some conflict; for example,

both HTML 4.0 and XML arise within W3C, yet the two are in

tension, even to the extent that Tim Berners-Lee (W3C's Director)

stated in July 1997:<i> </i><font face="WP TypographicSymbols"><i>A</i></font><i>&quot;It's

no wonder consumers, buyers and IT managers are concerned.....

Extensible Markup Language (XML) naturally supports a variety of

applications which could compromise the design of HTML</i><font

face="WP TypographicSymbols"><i>&quot;</i></font><i>.</i> [<a

href="http://www.w3.org/Press/HTML4-pers.html">http://www.w3.org/Press/HTML4-pers.html</a>].</p>



<p align="left">It is clear from recent activities such as MCF

and other NetScape and Microsoft proposals to W3C that Internet

standards (de facto or de jure) are now being heavily influenced

by commercial technology players fighting to provide better

access tools for internet and intranet applications in general

(and by so doing to gain commercial advantage for their

particular tools with a W3C imprimatur). Publishers will no doubt

benefit from these activities but have little chance of

influencing them. The World-Wide Web Consortium has so far not

produced many actions of immediate specific concern to STM

publishers; document identification, rights clearance mechanisms

and so on appear to be taking a relatively minor position in its

priorities compared to technical infrastructure issues and

pressing matters such as <font face="WP TypographicSymbols">&quot;</font>next

generation<font face="WP TypographicSymbols">&quot;</font>

addressing protocols. All of this is understandable but also

inevitable if one considers that most members of the W3C are

technology companies; few are electronic publishers, and only one

company (Reed-Elsevier) is a major publisher of both traditional

paper and electronic information. We cannot expect that special

cases such as STM material presentation, representing a tiny

proportion of internet traffic, will receive any favoured

treatment; we can however hope that the generation of

sufficiently open standards and technology will enable STM

material and transactions to be satisfactorily accomodated in

future web standards. As W3C reaches the end of its first

three-year funding and considers how to renew funding subscribers

(and attract more) this emphasis may change (which suggests a

possible action for those publishers interested in influencing

such events).</p>



<p align="left">STM publishers view the future scientific article

as containing multimedia elements: <font face="Times New Roman">full

text and abstract text; live </font><font

face="WP TypographicSymbols">&quot;</font><font

face="Times New Roman">hot spot</font><font

face="WP TypographicSymbols">&quot; </font><font

face="Times New Roman">references; video or audio clips;

supplementary data tables; software linkages to e.g. 3-D models;

links to other internet sites; forward links to comments,

corrections, future papers, etc. How can identifiers and metadata

assist us in developing such a rich system? The future digital

object will need to take the following themes for a solution:</font></p>



<p align="left"><font face="Times New Roman">- <i>Unique

identification</i>: unambiguous identification of a defined piece

of information, possibly with details of medium, version, format

etc.;</font></p>



<p align="left"><font face="Times New Roman">-<i> Multiple

linkage</i>: by stating which naming convention is used, multiple

naming or identification schemes should be possible (an idea

adopted in SICI and DOI).</font></p>



<p align="left"><font face="Times New Roman">- <i>Multiple

(overlapping) identification</i> of content (e.g. a sound clip

within a digital object may be identified by a music identifier

as well as being part of a document with another identifier; the

Dublin concept of relation may prove useful here); </font></p>



<p align="left"><font face="Times New Roman">- <i>Arbitrary

granularity</i>: if a publisher wants to identify a paragraph or

equation as a separate item he can do so; </font></p>



<p align="left"><font face="Times New Roman">- <i>Cascading

responsibility</i>: once below a certain level, no central agency

permission needed to assign unique numbers (sub-levels assigned

by the owner of the higher level);</font></p>



<p align="left"><font face="Times New Roman">- <i>Links to

metadata</i>: via simple identifiers pointing to specific

repositories for different needs, e.g. copyright, trading, EDI</font></p>



<p align="left"><font face="Times New Roman">- <i>Open standards</i>:

technical architecture interoperable with standard software

packages, making use of W3C approved standards.</font></p>



<p align="left"><font face="Times New Roman">- <i>Distributed

data</i>: not all data and metadata held on one site; a virtual

single network created from multiple interlinked servers.</font></p>



<p align="left"><font face="Times New Roman">- </font><font

face="WP TypographicSymbols">&quot;</font><font

face="Times New Roman"><i>Many but dumb</i></font><font

face="WP TypographicSymbols">&quot;:</font><font

face="Times New Roman"> a network of interconnected simple

identifiers and links is preferable to a all-embracing single

standard identifier which attempts to cover everything from a

scientific article to a new music release.</font></p>



<p align="left"><font face="Times New Roman">Once we have a

recognised interoperable network in which to exchange information

about digital information objects, we can begin to apply some of

the emerging electronic commerce standards to carry out

commercial transactions with them. </font>The demonstration of

DOI at Frankfurt this year holds out the promise of one such

workable system.</p>



<p align="left"><b>Glossary of abbreviations used in this review</b></p>



<table border="0">

    <tr>

        <td>AAP</td>

        <td>Association of American Publishers</td>

    </tr>

    <tr>

        <td>ANSI</td>

        <td>American National Standards Institute</td>

    </tr>

    <tr>

        <td>ASCII</td>

        <td>7-bit American National Standard Code for Information

        Interchange, ANSI X3.4:1986</td>

    </tr>

    <tr>

        <td>BIC</td>

        <td>Book Industry Communication (UK organisation)</td>

    </tr>

    <tr>

        <td>BICI</td>

        <td>Book Item and Contribution Identifier (proposed NISO

        development)</td>

    </tr>

    <tr>

        <td>CIS</td>

        <td>Common Information System (CISAC)</td>

    </tr>

    <tr>

        <td>CISAC</td>

        <td>Confederation International des Societies d<font

        face="WP TypographicSymbols">=</font>Auteurs et

        Compositeurs = International confederation of societies

        of authors and composers</td>

    </tr>

    <tr>

        <td>DOI</td>

        <td>Digital Object Identifier (AAP)</td>

    </tr>

    <tr>

        <td>EC</td>

        <td>European Commission</td>

    </tr>

    <tr>

        <td>HTTP</td>

        <td>Hyper Text Transfer Protocol</td>

    </tr>

    <tr>

        <td>IETF</td>

        <td>Internet Engineering Task Force</td>

    </tr>

    <tr>

        <td>IFPI</td>

        <td>International Federation of Phonographic Industries

        (London)</td>

    </tr>

    <tr>

        <td>IPA</td>

        <td>International Publishers Association</td>

    </tr>

    <tr>

        <td>ISBN</td>

        <td>International Standard ISO 2108:1992 <br>

        Information and Documentation - International Standard

        Book Numbering (ISBN)</td>

    </tr>

    <tr>

        <td>ISDI</td>

        <td>International Standard Document Identifier (proposed

        term)</td>

    </tr>

    <tr>

        <td>ISI</td>

        <td>Institute of Scientific Information, Inc.</td>

    </tr>

    <tr>

        <td>ISMN</td>

        <td>International Standard ISO 10957:1993 <br>

        Information and Documentation - International Standard

        Music Number (ISMN)</td>

    </tr>

    <tr>

        <td>ISO</td>

        <td>International Organization for Standardization </td>

    </tr>

    <tr>

        <td>ISRC</td>

        <td>International Standard ISO 3901:1986 <br>

        Documentation<b> - </b>International Standard Recording

        Code (ISRC): administered by IFPI</td>

    </tr>

    <tr>

        <td>ISSN</td>

        <td>International Standard ISO 3297:1986 <br>

        Documentation - International Standard Serial Numbering

        (ISSN)<br>

        US equivalent: ANSI Z39.9:1979 (R1984)</td>

    </tr>

    <tr>

        <td>ISWC</td>

        <td>International Standard Work Code (currently proposed

        to ISO TC 46)</td>

    </tr>

    <tr>

        <td>NISO</td>

        <td>National Information Standards Organisation (USA)</td>

    </tr>

    <tr>

        <td>OCLC</td>

        <td>Online Computer Library Center Inc.</td>

    </tr>

    <tr>

        <td>PII</td>

        <td>Publisher Item Identifier</td>

    </tr>

    <tr>

        <td>STI</td>

        <td>Scientific, Technical and Information publishers<font

        face="WP TypographicSymbols">=</font> group (ACS, AIP,

        APS, IEEE, Elsevier Science)</td>

    </tr>

    <tr>

        <td>STM</td>

        <td>International Association of Scientific, Technical

        and Medical Publishers </td>

    </tr>

    <tr>

        <td>URC</td>

        <td>(1) Uniform Resource Citation (IETF)<br>

        (2) Uniform Resource Characteristic (IETF)</td>

    </tr>

    <tr>

        <td>URI</td>

        <td>Uniform Resource Identifier (IETF)</td>

    </tr>

    <tr>

        <td>URL</td>

        <td>Uniform Resource Locator (IETF)</td>

    </tr>

    <tr>

        <td>URN</td>

        <td>Uniform Resource Name (IETF)</td>

    </tr>

    <tr>

        <td>W3C</td>

        <td>World Wide Web Consortium</td>

    </tr>

    <tr>

        <td>XML</td>

        <td>Extensible Markup Language (subset of SGML)</td>

    </tr>

</table>



<hr>



<p align="left"><b>Dr. Norman Paskin</b><br>

Director, Information Technology Development<br>

Elsevier Science<br>

The Boulevard<br>

Langford Lane<br>

Kidlington<br>

Oxford OX5 1GB, UK<i><br>

Tel: (+44) (0) 1865 843798<br>

Fax: (+44) (0) 1865 843967<br>

E mail: </i><a href="mailto:n.paskin@elsevier.co.uk"><i>n.paskin@elsevier.co.uk</i></a></p>



<hr>

</body>

</html>

<p>
Last update: 17 September 1997
<hr>
<font size=-1>Mirror sites: <a href="http://www.elsevier.nl" target="_top">www.europe</a> | <a href="http://www.elsevier.com" target="_top">www.usa</a> | <a href="http://www.elsevier.co.jp" target="_top">www.japan</a></font>
<br>
&#169; <a href = "/inca/homepage/about/c_right/">Copyright</a> 1997, Elsevier Science, All rights reserved.<br>

<!-- To avoid double titles -->
<img src=/inca/homepage/layout/images/blank.gif width=10 height=250>


</body></html>

Follow-Ups:
- Re: DOCBOOK: Q: how to store articles DOI numbers?
  - From: Norman Walsh <ndw@nwalsh.com>

References:
- Re: DOCBOOK: Q: how to store articles DOI numbers?
  - From: "Fred L. Drake, Jr." <fdrake@acm.org>