regrep message

Subject: Re: [regrep] [Namespace Manager Proposal] FW: XMLNDR [Broken URL]RE: XMLNDR PURLs, Handles, URLs, URNs & Namespaces
From: Robin Cover <robin@oasis-open.org>
To: Chiusano Joseph <chiusano_joseph@bah.com>
Date: Sat, 3 Sep 2005 21:08:46 -0400 (EDT)
On Fri, 2 Sep 2005, Chiusano Joseph wrote:

> Back in January 2002 I submitted a proposal to our list for a "Namespace
> Manager" function. The archive link is forever gone, and Robin Cover
> asked me to send him a summary of the highlights of that proposal. I am
> sending this to refresh our collective memories of this proposal, as we
> look toward version 4.0 (hint hint;)
> 
> Ironically (as you can see from the right side of the e-mail subject) I
> sent the archive URL in support of a discussion on persistent
> identifiers on the "U.S. Federal XML Naming and Design Rules" listserv,
> and subsequently discovered that it was a broken link.:p
> 
> To those in the U.S.: Happy Labor Day weekend.
> 
> Joe
> 
> Joseph Chiusano
> Booz Allen Hamilton
> O: 703-902-6923
> C: 202-251-0731
> Visit us online@ http://www.boozallen.com
[... see http://lists.oasis-open.org/archives/regrep/200509/msg00010.html ]

Joe and TC,

1) It does appear that some 2000-2500 messages (est.) are now missing
   from the TC's email list archive, for years 2002 and earlier

2) I have communicated my concern to the OASIS administration, hopeful
   that we can have the resources restored to their canonical URIs
   within a short time.  Subsequent to posting my note referenced in
   "3)", I observed that at least some of the messages are stored
   on the WayBack machine, including Joe's lost posting from the regrep
   archive

3) I have conveyed a message of [unofficial] apology to the FED-XML-NDR
   community within which this discovery of AWOL resources was
   discovered:

   https://fed-xml-ndr.core.gov/servlets/ReadMsg?list=listserv&msgNo=449

4) In principle, I think OASIS is committed to the practice of
   "not breaking links" to resources for which the organization has
   published dereferenceable URIs.  What may be needed is a policy
   which implements, in tangible business terms, a strategy to
   ensure that links are not broken, to monitor the websites for
   resources gone AWOL, and to repair inadvertently broken links.

5) The larger discussion about "broken links" on the FED-XML-NDR list
   prompted me to write up a renewed appeal for technologists to
   insist upon instititional commitments to the preservation of
   public resources and the preservation of links to resources
   identified with dereferenceable URIs.

   I append a portion of this note below, asking for the participation
   of members on the ebXML Regrep TC, in any forum where you
   participate, in raising consciousness about the role (I believe)
   we play in correcting this widespread problem of resource
   deterioration.  The deplorable situation will change, it seems to me,
   only when all of us demand commitment, policy, and vigilance on the
   part of jurisdictions and institutions responsible for persistent
   resources and links.

   If the perspective adopted in the note below is defensible (you
   be the judge! - but it's optional reading), there is fundamentally
   no excuse, in mid 2005, for broken links.  Radically revising our
   common expectations and calling public institutions to account for
   data loss are probably prerequisite to seeing substantial changes.
   Shrugging our shoulders and saying "That's the Web" is no solution.

Thanks,

Robin Cover

================= Note on resource "persistence" follows:

[...]

As a matter of personal judgment, I believe there is now consensus
technical opinion that accessible Web technologies, informed by
sound principles of Web Architecture [2], are now *more than adequate*
to support both resource and link persistence: institutions
(as URI owners) that exhibit, through public behavior, doubt
about the adequacy of this available technology, are apparently:

a) uninformed [e.g., unaware that HTTP and free server software
   provide the means to manage resource/link persistence]
b) unwilling to commit (axiologically) to core principles and
   values about resource/link persistence
c) unwilling to commit resources to ensure resource/link
   persistence
d) incompetent in one or more ways possibly related to "a)" - "c)"
e) did I miss anything in a-e ?  Maybe also: "inattentive"
   as a sub-species of a-c

Although the terminology used to describe URI persistence is a
bit problematic sometimes [3], I believe there is now general agreement
with principles articulated in the W3C Web Architecture document,
viz.,

  [We share a] "social expectation that once a URI identifies
  a particular resource, it should continue indefinitely to
  refer to that resource."

combined with resource persistence (a dereferenced URI gets you
the resource consistently and predictably):

  "A URI owner SHOULD provide representations of the identified
  resource consistently and predictably."

These values and principles were expressed in a poorly-titled [4]
document "Cool URIs don't change", written in 1998 [5].

The core assertion is captured in the first forty-eight words of
that 1998 TBL memo:

  "What makes a cool URI?
  A cool URI is one which does not change.
  What sorts of URI change?
  URIs don't change: people change them.
  There are no reasons at all in theory for people to change
  URIs (or stop maintaining documents), but millions of
  reasons in practice."

The plausible excuses for breaking links commonly offered by
people in 1998 are now, in 2005, simply lame excuses:

* "We just reorganized our website to make it better."
* "We have so much material that we can't keep track of what is
   out of date and what is confidential and what is valid and
   so we thought we'd better just turn the whole lot off."
* "Well, we found we had to move the files..."
* "We used to use a cgi script for this and now we use a binary
   program."
* "I didn't think URLs have to be persistent - that was URNs."
* "We would like to, but we just don't have the right tools."

Since 1998, web resources, tools, and technologies have improved
dramatically, making it easier than ever to support the
preservation and integrity of Web-accessible resources.

* the cost of disk storage has plummeted faster than for
  semiconductors, according to Kryder's Law ("Moore's Law for
  Storage"), -- the "doubling of processor speed every 18 months
  is a snail's pace compared with rising hard-disk capacity"
  One can get a 300 gigabyte Barracuda for just $150 [6].

* HTTP (1.0, 1.1) [7] is now widely supported by open source (free,
  publicly available) server software which provides web site
  administrators (as URI owners) with rich tools for managing
  predictable, stable, and "indefinite/persistent" support
  for published URIs, even in the face of moving web sites
  or otherwise re-architecting enterprise web server
  topologies.  See for example the free Apache Server [8], which
  has binaries for UNIXes and Win32, with support for rule-based
  URL rewriting, mapping URLs to filesystem locations, content
  negotiation, use of virtual hosts, etc. [9]

* generic web analysis and monitoring tools now make it easy to
  detect the existence of broken links; their statistical modules
  help interpret the results for web site administrators, providing
  early notification when something has gone wrong

In the context of this FED-XML-NDR discussion, within earshot of
some very important (government) policy-makers, I am thrilled to
read statements of general concern about "Broken URLs."  It's
a plague and a scourge upon the Web, as is commonly lamented.
Part of the solution, I believe, is to reshape public expectations
based upon the conviction that there are NO (or few) valid
excuses for breaking links.  Institutions that elect to use
BAD software ('broken as designed') which cannot even be configured
to maintain persistent links/resources should be called to
public account.  Institutions (jurisdictions) that demonstrate
negligible commitment to preservation of resources and links
should likewise be called to public account.  IMO.

In addition to individual and institutional commitment to resource
persistence, it will help to design architectures within which
the preservation of resources and links is consciously planned
for, and optimized, rather than ignored.  I'm happy for Owen
Ambur's statements of support, and for Todd Vincent's words, "The
system I have been describing accomplishes this requirement." [10]

As as aside, I have noted the increasing popularity of RDDL
(Resource Directory Description Language) as a method of
providing something informative when designers want to create
http (scheme) URI namespace names that do not directly resolve
under HTTP/DNS to the XML Schema (resource). The RDDL
"namespace document" is a simple [enhanced] XHTML document
that lives at the end of a dereferenced http URI namespace,
convening information about the nature and purpose(s) of the
resource associates with the namespace URI, including
typically, (URI-reference) hyperlinks to schemas, specifications,
editor contacts, etc.

See http://xml.coverpages.org/rddl.html

Thanks,

Robin Cover
[speaking only personally and unofficially]
XML Cover Pages
http://xml.coverpages.org/

* please send corrections to this message via email:
  robin@oasis-open.org

========= References:

[1] https://fed-xml-ndr.core.gov/servlets/ReadMsg?list=listserv&msgNo=448
    https://fed-xml-ndr.core.gov/servlets/ReadMsg?list=listserv&msgNo=445
    https://fed-xml-ndr.core.gov/servlets/ReadMsg?list=listserv&msgNo=442

    https://fed-xml-ndr.core.gov/servlets/ReadMsg?list=listserv&msgNo=443

    "I just found out that the URL I provided is now broken;
    I will try and find out from OASIS how to retrieve it from
    their architectures (it's broken in the OASIS ebXML
    Registry listserv public archives as well). Sorry for the
    inconvenience."

[2] URI persistence
    Architecture of the World Wide Web, Volume One
    W3C Recommendation 15 December 2004
    http://www.w3.org/TR/webarch/#URI-persistence

[3] "persistence" in connection with "link", "URI", "resource",
    and "representation" is potentially a confusing term because
    it does not clearly specify what "persists."  The phrases
    "stability," "predictability," and "consistent" are used
    in the W3C Web Architecture document to address the notion
    of persistence. "Policy and commitment on the part of the
    URI owner" are also foundational concepts.  Excerpt:
    
    "confidence in interactions via the Web depends on stability
    and predictability. For an information resource, persistence
    depends on the consistency of representations...Although
    persistence in this case [Representation Management] is
    observable as a result of representation retrieval, the term
    URI persistence is used to describe the desirable property
    that, once associated with a resource, a URI should continue
    indefinitely to refer to that resource...

    Good practice: Consistent representation. A URI owner SHOULD
    provide representations of the identified resource
    consistently and predictably.
    
    URI persistence is a matter of policy and commitment on the
    part of the URI owner...
    
    HTTP [RFC 2616] has been designed to help manage URI persistence.
    For example, HTTP redirection (using the 3xx response codes)
    permits servers to tell an agent that further action needs to
    be taken by the agent in order to fulfill the request
    (for example, a new URI is associated with the resource)...
    In addition, content negotiation also promotes consistency, as
    a site manager is not required to define new URIs when adding
    support for a new format specification

    For more discussion about URI persistence, see "Cool URIs
    don't change" 1998, by Tim BL [at]
    http://www.w3.org/Provider/Style/URI.html 

    A "consistent representation" that provides stability and
    predictability for a URI-dereferenceable resource means
    not only indefinite association of a URI with a resource,
    but indefinite support for retrieval via dereferencing.

    In this context we are not concerned with URIs that are
    not initially intended by URI owners as identifiers for
    dereferenceable resources, per:
    http://www.w3.org/TR/webarch/#representation-management
    Representation Management: "Just because representations
    are available does not mean that it is always desirable
    to retrieve them..."

[4] the authors of the W3C Web Architecture document
    acknowledge that the title "Cool URIs don't change"
    is infelicitous: "Note that the title is somewhat misleading.
    It is not the URIs that change, it is what they identify."
    http://www.w3.org/TR/webarch/#Cool

[5] "Cool URIs don't change" 1998, by Tim BL
    http://www.w3.org/Provider/Style/URI.html

[6] cost of disk/mass storage
http://en.wikipedia.org/wiki/Kryder%27s_Law
http://en.wikipedia.org/wiki/Moore's_law
http://www.storagereview.com/

ST3300831AS 300GB Barracuda  $150 [2005-09-03, online]
8-gigabyte 1-inch microdrives
60-gigabyte 1.8-inch Slim Bling hard drive

http://tinyurl.com/extmh
Kryder's Law, by Chip Walter
The doubling of processor speed every 18 months is a snail's pace
compared with rising hard-disk capacity, and Mark Kryder plans
to squeeze in even more bits... By 1998, when Kryder joined Seagate
to form its advanced research center, the DSSC had set an even
loftier target: crowd 100 gigabits into a square inch by the
early 21st century. In 2005, just seven years later, Seagate
began shipping 110-gigabit drives. Inside of a decade and a half,
hard disks had increased their capacity 1,000-fold, a rate that
Intel founder Gordon Moore himself has called "flabbergasting."

[7] http://www.ietf.org/rfc/rfc2616.txt
    http://www.w3.org/1999/07/HTTP-PressRelease  July 07, 1999
   "World Wide Web Consortium Supports HTTP/1.1 Reaching IETF Draft Standard"
 

[8] http://httpd.apache.org/
    Apache HTTP Server Project "The Number One HTTP Server on the 
Internet"

    The Apache HTTP Server Project is a collaborative software
    development effort aimed at creating a robust,
    commercial-grade, featureful, and freely-available source
    code implementation of an HTTP (Web) server... Apache has
    been the most popular web server on the Internet since
    April of 1996. The February 2005 Netcraft Web Server Survey
    found that more than 68% of the web sites on the Internet
    are using Apache, thus making it more widely used than all
    other web servers combined."

[9] http://httpd.apache.org/docs/2.0/misc/rewriteguide.html
    http://httpd.apache.org/docs/2.0/mod/mod_rewrite.html
    URL Rewriting Guide
    
    A versatile and powerful set of tools for: forcing the use of
    canonical hostnames; creating a homogeneous and consistent
    URL layout over all WWW servers on a Intranet webcluster;
    redirecting *just* all homedirs on one webserver to another
    webserver; redirect homedir URLs to another webserver when
    the requesting user does not stay in the local domain;
    filesystem reorganization; redirecting failing requests on
    webserver A to webserver B; extended redirection supporting
    character escaping mechanisms; time-dependent rewriting;
    load balancing; on-the-fly content-regeneration; seamless
    transformation from static to dynamic; mass virtual hosting;
    use of an external rewriting engine; etc
    
    http://httpd.apache.org/docs/2.0/content-negotiation.html
    Content Negotiation
    
    http://httpd.apache.org/docs/2.0/urlmapping.html
    Mapping URLs to Filesystem Locations
    
    http://httpd.apache.org/docs/2.0/vhosts/
    Apache Virtual Host documentation

[10] statements from Owen Ambur and Todd Vincent
https://fed-xml-ndr.core.gov/servlets/ReadMsg?list=listserv&msgNo=444
https://fed-xml-ndr.core.gov/servlets/ReadMsg?list=listserv&msgNo=446
References:
- [Namespace Manager Proposal] FW: XMLNDR [Broken URL] RE: XMLNDR PURLs, Handles, URLs, URNs & Namespaces
  - From: "Chiusano Joseph" <chiusano_joseph@bah.com>