entity-resolution message

Subject: Re: Fwd: Review of OASIS Artifact Identification Requirements(includes URN, namespace updates)
From: Norman Walsh <ndw@nwalsh.com>
To: entity-resolution@lists.oasis-open.org
Date: Tue, 28 Jun 2005 16:39:33 -0400
/ Mary McRae <marypmcrae@gmail.com> was heard to say:
|   This was supposed to have been forwarded to the TC for review; I
| didn't see it come across the email list so I'm (re)posting.

I had the impression that some private review was being solicited
before this document was surfaced in the public, but I am delighted to
see that I was mistaken.

The entity resolution list is public, so in the interest of making my
comments public, I am reposting them here:

I have a number of technical comments about the draft, but before I go
into that level of detail, I want to begin with a painful, high-level
observation: I think this document approaches the metadata problem it
seeks to solve in fundamentally the wrong way. I say that with
trepidation and a horrible sense of responsibility since it clearly
draws some of its inspiration from the work that Karl Best and I did
in 2001. In the intervening years, I have changed my position[1] on
the central question of "names and addresses" such that I no longer
support URNs at all for the sorts of artifacts Karl and I were
attempting to name.

The URN registration process requires that a new URN scheme describe
its purpose and the procedure used to construct a URN in that scheme
in exact detail. That is why RFC 3121 describes an ontology of
artifacts and a procedure for constructing compound names (URNs) from
the artifacts, their titles, types, versions, and other metadata.

I think adapting that methodology for naming artifacts with http: URIs
is a recipe for disaster. The process is rigid, brittle, confusing,
and unnecessary. The public will not understand it, TCs will find it
crushingly burdensome, and it will not scale. Please don't do it.

At the end of the day, a member of the public, looking at a document
produced by an OASIS TC, needs to be able to quickly, easily, and
unambiguously answer some simple questions such as:

  1. What is this?
  2. Who produced it?
  3. Is it the most recent version?
  4. If it's not the most recent, where is the most recent?
  5. What is its status?
  6. When was it produced?
  7. What has the TC done since it produced this?

I propose that you adopt a much simpler alternative. It is not without
points over which there may be controversy, but it has proven to be
robust and scalable.

First, decide what metadata you will require every TC to associate
with every document that it publishes officially. (As a corollary, you
want to make sure that every draft that is distributed unofficially is
distinct from all the official drafts.)

The current draft lays out most of this metadata (and some other
elements that I don't think are necessary under this alternative
proposal):

  1. A TC Name
  2. A Title
  3. A Version, if appropriate
  4. A Revision, if appropriate
  5. A Stage
  6. An Abstract
  7. A Language

I think it would make sense to include a few more things:

  8. An Editor (or Editors) Name (or Names)
  9. A Date
 10. A Copyright
 11. Some sort of IPR statement
 12. Links to any appropriate feedback URIs or email addresses

It may be valuable to associate some boilerplate with the document
as well.

Second, require that every TC provide all of this data in a machine
readable form. I propose that the most straightforward way to do this
is to require that the normative version of each specification be
expressed in XHTML and to require that the "title page" of that XHTML
document contain this metadata in a form that is both visible to the
reader and can be extracted by a tool.

This may be a point of controversy since the current process allows
the normative version of artifacts to be published in other formats.
Briefly, I think that's a mistake too. The web is how documents are
distributed in the modern world and (X)HTML is the lingua franca of
the web. (The fact that the current process allows the normative
version of a specification to be published in *both* (X)HTML and PDF
is totally unacceptable as it will eventually result in two putatively
normative specifications that do not agree with each other.)

As to the URI used for these specifications, I think it would be
sufficent to use the short, administrator approved, product name to
construct the URI as follows:

  http://docs.oasis-open.org/name-of-tc/name-of-product/

The administrator can make sure that no product name is ever reused by
a single TC. I think it would make sense if the URI above identified
the "current version" of the product specification. It would also make
sense to publish a dated URI as well to point to specific versions:

  http://docs.oasis-open.org/name-of-tc/name-of-product-YYYY-MM-DD/

As to the naming of various other artifacts associated with the
product (schemas, images, stylesheets, etc.), I think it's probably
sufficient to say that every one of them must be reachable (at least
indirectly) through links from the normative specification.

I hope that these comments are helpful and I do regret that these
comments are probably quite radically different from what you had
expected. I think if you'd done more of the work on this document in
public, it would have been possible to make these suggestions earlier
when it might have been less difficult to make the necessary changes.

                                        Be seeing you,
                                          norm

[1] http://norman.walsh.name/2004/03/03/266NorthPleasant

-- 
Norman.Walsh@Sun.COM / XML Standards Architect / Sun Microsystems, Inc.
NOTICE: This email message is for the sole use of the intended
recipient(s) and may contain confidential and privileged information.
Any unauthorized review, use, disclosure or distribution is prohibited.
If you are not the intended recipient, please contact the sender by
reply email and destroy all copies of the original message.
PGP signature
Follow-Ups:
- Re: [entity-resolution] Re: Fwd: Review of OASIS Artifact IdentificationRequirements (includes URN, namespace updates)
  - From: Robin Cover <robin@oasis-open.org>
References:
- Fwd: Review of OASIS Artifact Identification Requirements (includes URN, namespace updates)
  - From: Mary McRae <marypmcrae@gmail.com>