xdi message

Subject: RE: [xdi] Rationale for pursuing Dataweb architecture
From: "Sakimura, Nat" <n-sakimura@nri.co.jp>
To: "Drummond Reed" <drummond.reed@cordance.net>, <xdi@lists.oasis-open.org>
Date: Sun, 14 Nov 2004 02:29:16 +0900
There are couple of points/questions that I would like to make here: 

(1)  Business Card Examples: 
Looking at the Business Card Examples, I see that it has come 
very close to what we have talked over the f2f. Only the deviation 
from what we have discussed there are: 

(a) Tag names: What it was called  <data> or <body> at f2f is 
     changed to <complex> etc. 
(b) Introduction of <Instance> tag: What is the use for it? 
      I would appreciate an explanation of it. 
(c) Meta data part are not enclosed in <meta> or <header> tag. 

(2) Dictionary
Google does not use a dictionary. It is a string indexing. 
Neither OpenText does. If it relies on dictionary, it would not 
be able to search for new concepts, new words, new languages. 
No reliance on the dictionary was the strength of Google. 

I do not want XDI to be relying on a dictionary in the sense 
it is referred to here. It would be a huge hindrance. We could 
use Dictionary, but not require it. 

(3) Proliferation of HTML. 
One of the biggest factor for HTML proliferation was its 
astounding simplicity at the outset. One could trivially 
write a page with merely a notepad. Subsequently, HTML 
gained substantial complexity, but I believe that the 
simplicity at the outset was the key. The same is true for 
HTTP. Once could write a HTTPD trivially. 

Nat


> -----Original Message-----
> From: Drummond Reed [mailto:drummond.reed@cordance.net] 
> Sent: Thursday, November 11, 2004 9:16 AM
> To: xdi@lists.oasis-open.org
> Subject: [xdi] Rationale for pursuing Dataweb architecture
> 
> XDI TC Members and Observers,
> 
> As published today in the draft minutes of the F2F meeting 
> two weeks ago in Denver (see 
> http://www.oasis-open.org/apps/org/workgroup/xdi/download.php/
> 10001/MINUTES%
> 20OF%2010-28-29-04%20XDI%20TC%20FACE%20TO%20FACE%20MEETING%20%
> 28Official%29.
> txt), the core topic discussed at the meeting was the two 
> potential architectural models the XDI TC could follow.
> 
> These can be loosely summarized as the "data envelope" or 
> "SOAP-for-data"
> model and the "Dataweb" or "HTML-for-data" model.
> 
> While most of you know I am a strong Dataweb architecture 
> advocate, some of the concepts from the data envelope model 
> are very attractive, and they have very much influenced my 
> thinking about the Dataweb model. This is reflected in a new 
> schema proposal and several example documents using this 
> schema that I posted last night:
> 
> * New schema proposal:
> http://www.oasis-open.org/committees/download.php/9988/draft-x
> di-dataweb-sch
> ema-v1.xsd
> 
> * Simple XDI business card (w/all data referenced):
> http://www.oasis-open.org/committees/download.php/9989/draft-e
> xample-dataweb
> -bizcard-short-v1.xml
> 
> * Long-form XDI business card (w/all references resolved):
> http://www.oasis-open.org/committees/download.php/9990/draft-e
> xample-dataweb
> -bizcard-long-v1.xml
> 
> * Example of XDI Descriptor in this XDI format:
> http://www.oasis-open.org/committees/download.php/9991/draft-e
> xample-dataweb
> -XRID-v1.xml
> 
> However, in doing through this work, and after another good 
> conversation with Dave last Friday, I have become more deeply 
> convinced about the Dataweb model. This email summarizes my 
> rationale in preparation for further discussion on today's TC 
> call. It breaks into three parts:
> 
> * Value proposition for the Dataweb
> * The role of XDI dictionaries
> * The need for an XDI Logical Data Object Model (LDOM)
> 
> VALUE PROPOSITION FOR THE DATAWEB
> 
> The root of my rationale is the core value proposition that 
> "XDI can do for global data sharing what the Web did for 
> global content sharing." Here's a more detailed way of 
> framing that value proposition that Dave and I discussed last 
> Friday. It starts with the value proposition for the Web:
> 
> ***Value Proposition for the Web***
> 
> With the Web, we wanted to create a single presentation 
> engine (browser) for all content without knowing anything 
> directly about the content. Besides the visualization markup, 
> the presentation engine doesn't need to know anything about 
> the content.
> 
> Although this presented potentially a huge barrier to 
> adoption - the need for every content publisher to markup 
> their content in this new markup format - there was a value 
> proposition that successfully drove millions of content 
> publishers to do just that:
> 
> 	"If you put your content into this format, it can be: 
> a) rendered on every desktop in the world, and b) referenced 
> and linked to/from any other content in the world, and c) 
> searched and indexed by any content search engine in the world."
> 
> *****
> 
> Bingo! The result is history. The greatest transformation of 
> global information infrastructure ever.
> 
> The core concept of the Dataweb is to do the same thing for 
> machine-readable data that the Web did for human-readable 
> content. In fact, we can express this as literally a 
> word-for-word transposition of the above value
> proposition:
> 
> ***Value Proposition for the Dataweb***
> 
> With the Dataweb, we want to create a single data interchange engine
> (i-broker) for all data without knowing anything directly 
> about the data.
> Besides the data control markup, the data interchange engine 
> doesn't need to know anything about the data.
> 
> Although this presents potentially a huge barrier to adoption 
> - the need for every data publisher to markup their data in 
> this new markup format - there is a value proposition that 
> can successfully drive millions of data publishers to do just that:
> 
> 	"If you put your data into this format, it can be: a) 
> interchanged with every system in the world, and b) 
> referenced and linked to/from any other data in the world, 
> and c) searched and indexed by any database search engine in 
> the world."
> 
> *****
> To me, this perfectly describes the goal of XDI: a common 
> data interchange format (represented by a single common XML 
> schema) together with a common data interchange service for 
> adding, modifying, deleting, and processing XDI documents.
> 
> DATAWEB DICTIONARIES
> 
> Whatsmore, when we're operating at the level of 
> machine-readable data vs.
> human-readable content, I believe there is another major 
> element to the Dataweb value proposition that is missing (in 
> a direct way) from the Web value proposition: Dataweb 
> dictionaries. Again this is probably best described via 
> analogy to the Web.
> 
> Arguably the single most valuable aspect to the Web is the 
> ability to locate desired content almost instantly, using 
> search engines such as Google.
> However this only works because of a simple fact: human 
> languages inherently consist of shared dictionaries of 
> concepts ("keywords") with which the search engines can 
> create their indexes. It is only due to our common knowledge 
> of these dictionaries (the copies we all carry around in our own
> heads) that search engines can do their magic. Otherwise they 
> wouldn't know how to index and we wouldn't know what to enter 
> as search criteria.
> 
> When it comes to the Dataweb, and we move from the sphere of 
> human-readable content to machine-readable data, this problem 
> is magnified immensely. The biggest single problem with 
> sharing machine-readable data across systems is that there 
> are no humans in the loop to do the "fuzzy matching" that 
> humans are so good at (and that search engines like Google 
> can help so much with).
> In order to actually share data across systems, machines need 
> to be able to do *exact bit-for-bit matching*. No ambiguity.
> 
> The problem gets even worse when we consider that today there 
> does not exist anything close to a universal data dictionary 
> from which such matching could be done. In other words, it's 
> not like the Web, where all the dictionaries (common 
> vocabularies of human language) already existed, and we just 
> needed to find a common way to represent them. With the 
> Dataweb, the dictionaries don't even exist yet.
> 
> In fact, the closest thing to those dictionaries are the 
> existing XML schemas or RDF vocabularies that have been 
> created in order to establish common semantics for data interchange.
> 
> So I would argue that, just as it became a fundamental design 
> goal of XML to make XML schemas expressable in XML itself 
> (thus leading to the W3C XML Schemas specification), it must 
> be a fundamental design goal of XDI to make XDI dictionaries 
> expressable in XDI itself. Because unlike XML, which had DTDs 
> to turn to, XDI implementations will have no practical way of 
> interoperating without XDI dictionaries. XDI dictionaries are 
> the only way to get the direct bit-for-bit data matching 
> necessary for true interoperability.
> 
> THE NEED FOR AN XDI DATA OBJECT MODEL
> 
> As discussed above, the Web solved the problem of content 
> interoperability by adopting a single markup format, HTML, 
> which any rendering engine
> (browser) could display. This common format, which later led 
> to the development of XML, also led to a common object model 
> for parsing and manipulating "document objects". This was the 
> Document Object Model (DOM).
> 
> It follows that if data-oriented systems are to adopt a 
> common model for data interchange, and if this model is to be 
> based on a common XML data format, this format must reflect a 
> common logical data object model, or LDOM.
> 
> To be universal, the LDOM must be very simple and capable of 
> expressing fundamental relationships between data elements 
> the same way XML expresses fundamental relationships between 
> content elements. In the work over the past six months, we 
> have been looking at XDI schema proposals that boiled this 
> down to just two types of relationships: 1) hierarchical 
> relationships, and b) peer-to-peer, or "web" relationships.
> 
> The other key requirement of an LDOM is that every data 
> element be uniquely addressable (just as it is in a 
> database). Thus the requirement in the schema proposals so 
> far that every resource be addressable via at least one XRI.
> 
> A successful LDOM, then, would be representable in a single 
> XML schema that, while capable of carrying existing XML data 
> as a "payload", would inherently require markup of some 
> metadata into this new format, just as HTML was capable of 
> carrying existing text and graphics but required at least 
> some markup in HTML format.
> 
> That, in a nutshell, is what I believe we should be driving 
> for with the XDI schema.
> 
> ***EOM***
> 
> 
> 
> 
> 
>
Follow-Ups:
- RE: [xdi] Rationale for pursuing Dataweb architecture
  - From: "Drummond Reed" <drummond.reed@cordance.net>