[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: Re: [regrep] [Namespace Manager Proposal] FW: XMLNDR [Broken URL]RE: XMLNDR PURLs, Handles, URLs, URNs & Namespaces
On Fri, 2 Sep 2005, Chiusano Joseph wrote: > Back in January 2002 I submitted a proposal to our list for a "Namespace > Manager" function. The archive link is forever gone, and Robin Cover > asked me to send him a summary of the highlights of that proposal. I am > sending this to refresh our collective memories of this proposal, as we > look toward version 4.0 (hint hint;) > > Ironically (as you can see from the right side of the e-mail subject) I > sent the archive URL in support of a discussion on persistent > identifiers on the "U.S. Federal XML Naming and Design Rules" listserv, > and subsequently discovered that it was a broken link.:p > > To those in the U.S.: Happy Labor Day weekend. > > Joe > > Joseph Chiusano > Booz Allen Hamilton > O: 703-902-6923 > C: 202-251-0731 > Visit us online@ http://www.boozallen.com [... see http://lists.oasis-open.org/archives/regrep/200509/msg00010.html ] Joe and TC, 1) It does appear that some 2000-2500 messages (est.) are now missing from the TC's email list archive, for years 2002 and earlier 2) I have communicated my concern to the OASIS administration, hopeful that we can have the resources restored to their canonical URIs within a short time. Subsequent to posting my note referenced in "3)", I observed that at least some of the messages are stored on the WayBack machine, including Joe's lost posting from the regrep archive 3) I have conveyed a message of [unofficial] apology to the FED-XML-NDR community within which this discovery of AWOL resources was discovered: https://fed-xml-ndr.core.gov/servlets/ReadMsg?list=listserv&msgNo=449 4) In principle, I think OASIS is committed to the practice of "not breaking links" to resources for which the organization has published dereferenceable URIs. What may be needed is a policy which implements, in tangible business terms, a strategy to ensure that links are not broken, to monitor the websites for resources gone AWOL, and to repair inadvertently broken links. 5) The larger discussion about "broken links" on the FED-XML-NDR list prompted me to write up a renewed appeal for technologists to insist upon instititional commitments to the preservation of public resources and the preservation of links to resources identified with dereferenceable URIs. I append a portion of this note below, asking for the participation of members on the ebXML Regrep TC, in any forum where you participate, in raising consciousness about the role (I believe) we play in correcting this widespread problem of resource deterioration. The deplorable situation will change, it seems to me, only when all of us demand commitment, policy, and vigilance on the part of jurisdictions and institutions responsible for persistent resources and links. If the perspective adopted in the note below is defensible (you be the judge! - but it's optional reading), there is fundamentally no excuse, in mid 2005, for broken links. Radically revising our common expectations and calling public institutions to account for data loss are probably prerequisite to seeing substantial changes. Shrugging our shoulders and saying "That's the Web" is no solution. Thanks, Robin Cover ================= Note on resource "persistence" follows: [...] As a matter of personal judgment, I believe there is now consensus technical opinion that accessible Web technologies, informed by sound principles of Web Architecture [2], are now *more than adequate* to support both resource and link persistence: institutions (as URI owners) that exhibit, through public behavior, doubt about the adequacy of this available technology, are apparently: a) uninformed [e.g., unaware that HTTP and free server software provide the means to manage resource/link persistence] b) unwilling to commit (axiologically) to core principles and values about resource/link persistence c) unwilling to commit resources to ensure resource/link persistence d) incompetent in one or more ways possibly related to "a)" - "c)" e) did I miss anything in a-e ? Maybe also: "inattentive" as a sub-species of a-c Although the terminology used to describe URI persistence is a bit problematic sometimes [3], I believe there is now general agreement with principles articulated in the W3C Web Architecture document, viz., [We share a] "social expectation that once a URI identifies a particular resource, it should continue indefinitely to refer to that resource." combined with resource persistence (a dereferenced URI gets you the resource consistently and predictably): "A URI owner SHOULD provide representations of the identified resource consistently and predictably." These values and principles were expressed in a poorly-titled [4] document "Cool URIs don't change", written in 1998 [5]. The core assertion is captured in the first forty-eight words of that 1998 TBL memo: "What makes a cool URI? A cool URI is one which does not change. What sorts of URI change? URIs don't change: people change them. There are no reasons at all in theory for people to change URIs (or stop maintaining documents), but millions of reasons in practice." The plausible excuses for breaking links commonly offered by people in 1998 are now, in 2005, simply lame excuses: * "We just reorganized our website to make it better." * "We have so much material that we can't keep track of what is out of date and what is confidential and what is valid and so we thought we'd better just turn the whole lot off." * "Well, we found we had to move the files..." * "We used to use a cgi script for this and now we use a binary program." * "I didn't think URLs have to be persistent - that was URNs." * "We would like to, but we just don't have the right tools." Since 1998, web resources, tools, and technologies have improved dramatically, making it easier than ever to support the preservation and integrity of Web-accessible resources. * the cost of disk storage has plummeted faster than for semiconductors, according to Kryder's Law ("Moore's Law for Storage"), -- the "doubling of processor speed every 18 months is a snail's pace compared with rising hard-disk capacity" One can get a 300 gigabyte Barracuda for just $150 [6]. * HTTP (1.0, 1.1) [7] is now widely supported by open source (free, publicly available) server software which provides web site administrators (as URI owners) with rich tools for managing predictable, stable, and "indefinite/persistent" support for published URIs, even in the face of moving web sites or otherwise re-architecting enterprise web server topologies. See for example the free Apache Server [8], which has binaries for UNIXes and Win32, with support for rule-based URL rewriting, mapping URLs to filesystem locations, content negotiation, use of virtual hosts, etc. [9] * generic web analysis and monitoring tools now make it easy to detect the existence of broken links; their statistical modules help interpret the results for web site administrators, providing early notification when something has gone wrong In the context of this FED-XML-NDR discussion, within earshot of some very important (government) policy-makers, I am thrilled to read statements of general concern about "Broken URLs." It's a plague and a scourge upon the Web, as is commonly lamented. Part of the solution, I believe, is to reshape public expectations based upon the conviction that there are NO (or few) valid excuses for breaking links. Institutions that elect to use BAD software ('broken as designed') which cannot even be configured to maintain persistent links/resources should be called to public account. Institutions (jurisdictions) that demonstrate negligible commitment to preservation of resources and links should likewise be called to public account. IMO. In addition to individual and institutional commitment to resource persistence, it will help to design architectures within which the preservation of resources and links is consciously planned for, and optimized, rather than ignored. I'm happy for Owen Ambur's statements of support, and for Todd Vincent's words, "The system I have been describing accomplishes this requirement." [10] As as aside, I have noted the increasing popularity of RDDL (Resource Directory Description Language) as a method of providing something informative when designers want to create http (scheme) URI namespace names that do not directly resolve under HTTP/DNS to the XML Schema (resource). The RDDL "namespace document" is a simple [enhanced] XHTML document that lives at the end of a dereferenced http URI namespace, convening information about the nature and purpose(s) of the resource associates with the namespace URI, including typically, (URI-reference) hyperlinks to schemas, specifications, editor contacts, etc. See http://xml.coverpages.org/rddl.html Thanks, Robin Cover [speaking only personally and unofficially] XML Cover Pages http://xml.coverpages.org/ * please send corrections to this message via email: robin@oasis-open.org ========= References: [1] https://fed-xml-ndr.core.gov/servlets/ReadMsg?list=listserv&msgNo=448 https://fed-xml-ndr.core.gov/servlets/ReadMsg?list=listserv&msgNo=445 https://fed-xml-ndr.core.gov/servlets/ReadMsg?list=listserv&msgNo=442 https://fed-xml-ndr.core.gov/servlets/ReadMsg?list=listserv&msgNo=443 "I just found out that the URL I provided is now broken; I will try and find out from OASIS how to retrieve it from their architectures (it's broken in the OASIS ebXML Registry listserv public archives as well). Sorry for the inconvenience." [2] URI persistence Architecture of the World Wide Web, Volume One W3C Recommendation 15 December 2004 http://www.w3.org/TR/webarch/#URI-persistence [3] "persistence" in connection with "link", "URI", "resource", and "representation" is potentially a confusing term because it does not clearly specify what "persists." The phrases "stability," "predictability," and "consistent" are used in the W3C Web Architecture document to address the notion of persistence. "Policy and commitment on the part of the URI owner" are also foundational concepts. Excerpt: "confidence in interactions via the Web depends on stability and predictability. For an information resource, persistence depends on the consistency of representations...Although persistence in this case [Representation Management] is observable as a result of representation retrieval, the term URI persistence is used to describe the desirable property that, once associated with a resource, a URI should continue indefinitely to refer to that resource... Good practice: Consistent representation. A URI owner SHOULD provide representations of the identified resource consistently and predictably. URI persistence is a matter of policy and commitment on the part of the URI owner... HTTP [RFC 2616] has been designed to help manage URI persistence. For example, HTTP redirection (using the 3xx response codes) permits servers to tell an agent that further action needs to be taken by the agent in order to fulfill the request (for example, a new URI is associated with the resource)... In addition, content negotiation also promotes consistency, as a site manager is not required to define new URIs when adding support for a new format specification For more discussion about URI persistence, see "Cool URIs don't change" 1998, by Tim BL [at] http://www.w3.org/Provider/Style/URI.html A "consistent representation" that provides stability and predictability for a URI-dereferenceable resource means not only indefinite association of a URI with a resource, but indefinite support for retrieval via dereferencing. In this context we are not concerned with URIs that are not initially intended by URI owners as identifiers for dereferenceable resources, per: http://www.w3.org/TR/webarch/#representation-management Representation Management: "Just because representations are available does not mean that it is always desirable to retrieve them..." [4] the authors of the W3C Web Architecture document acknowledge that the title "Cool URIs don't change" is infelicitous: "Note that the title is somewhat misleading. It is not the URIs that change, it is what they identify." http://www.w3.org/TR/webarch/#Cool [5] "Cool URIs don't change" 1998, by Tim BL http://www.w3.org/Provider/Style/URI.html [6] cost of disk/mass storage http://en.wikipedia.org/wiki/Kryder%27s_Law http://en.wikipedia.org/wiki/Moore's_law http://www.storagereview.com/ ST3300831AS 300GB Barracuda $150 [2005-09-03, online] 8-gigabyte 1-inch microdrives 60-gigabyte 1.8-inch Slim Bling hard drive http://tinyurl.com/extmh Kryder's Law, by Chip Walter The doubling of processor speed every 18 months is a snail's pace compared with rising hard-disk capacity, and Mark Kryder plans to squeeze in even more bits... By 1998, when Kryder joined Seagate to form its advanced research center, the DSSC had set an even loftier target: crowd 100 gigabits into a square inch by the early 21st century. In 2005, just seven years later, Seagate began shipping 110-gigabit drives. Inside of a decade and a half, hard disks had increased their capacity 1,000-fold, a rate that Intel founder Gordon Moore himself has called "flabbergasting." [7] http://www.ietf.org/rfc/rfc2616.txt http://www.w3.org/1999/07/HTTP-PressRelease July 07, 1999 "World Wide Web Consortium Supports HTTP/1.1 Reaching IETF Draft Standard" [8] http://httpd.apache.org/ Apache HTTP Server Project "The Number One HTTP Server on the Internet" The Apache HTTP Server Project is a collaborative software development effort aimed at creating a robust, commercial-grade, featureful, and freely-available source code implementation of an HTTP (Web) server... Apache has been the most popular web server on the Internet since April of 1996. The February 2005 Netcraft Web Server Survey found that more than 68% of the web sites on the Internet are using Apache, thus making it more widely used than all other web servers combined." [9] http://httpd.apache.org/docs/2.0/misc/rewriteguide.html http://httpd.apache.org/docs/2.0/mod/mod_rewrite.html URL Rewriting Guide A versatile and powerful set of tools for: forcing the use of canonical hostnames; creating a homogeneous and consistent URL layout over all WWW servers on a Intranet webcluster; redirecting *just* all homedirs on one webserver to another webserver; redirect homedir URLs to another webserver when the requesting user does not stay in the local domain; filesystem reorganization; redirecting failing requests on webserver A to webserver B; extended redirection supporting character escaping mechanisms; time-dependent rewriting; load balancing; on-the-fly content-regeneration; seamless transformation from static to dynamic; mass virtual hosting; use of an external rewriting engine; etc http://httpd.apache.org/docs/2.0/content-negotiation.html Content Negotiation http://httpd.apache.org/docs/2.0/urlmapping.html Mapping URLs to Filesystem Locations http://httpd.apache.org/docs/2.0/vhosts/ Apache Virtual Host documentation [10] statements from Owen Ambur and Todd Vincent https://fed-xml-ndr.core.gov/servlets/ReadMsg?list=listserv&msgNo=444 https://fed-xml-ndr.core.gov/servlets/ReadMsg?list=listserv&msgNo=446
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]