OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

office-collab message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: RE: [office-collab] Using RDF for Change Tracking serialization?


SHORT ANSWER

It is unacceptable to use the existing RDF provisions of ODF to incorporate the data structures involved in change tracking.

 - Dennis

LONG ANSWER

There are those of us who believe that RDF is meant to be descriptive and to provide metadata that can be used with regard to the *semantics* of documents.  In that case, RDF is unidirectional.  It expresses things about what the document (not so much the XML or the format) expresses but it does not impact what that expression is in any way.

Since the way RDF is expressed requires pointing at aspects of a document, this does mean, for mechanical reasons, that it does so by pointing at fragments in the document format.   Presumably these fragments serve as sufficient connection to the document content (when rendered for human consumption) that is what the RDF is interpreted as carrying assertions about.

This does leave a rather interesting conundrum.

(1) It should be possible to consume and present a document without consideration of the RDF whatsoever, whether in the package-carried RDF/XML files and some files called manifest.rdf or embedded in the content.xml file as RDFa.  (There is an ODF element that breaks this case, but that is a minor blemish so far.)

(2) Producing an ODF document is complicated by the fact that there is no way of determining, in a more-or-less pure producer, whether changes to the document's content as represented in ODF format to cause the RDFa or the separately packaged RDF/XML to now be inconsistent.  (Think maintenance of software that does not change the relevant comments.)

While it is recommended that RDF be retained, there is no reliable way of knowing whether or not there is now an inconsistency.  All one can really do is attempt to preserve any xml:id already in the content.xml and not generate any duplicates. (This preserves reference to fragments of the document by RDF IRIs but won't assure much if the access is by XPATH or some other means of reference.)  If some action has the effect of deleting an element having an xml:id, that's just unfortunate.  Clearly, RDFa can be deleted along with the elements to which it is attached.  It is less clear what can be done if the element is modified and there is no means to determine whether the result remains consistent with any associated RDFa. 

So there is no good way to assure consistency, nor is there any provision whatsoever in the ODF specification for how consistency is maintainable.  The automatic retention of RDF found in the consumption of a document leads to some serious consequences when we consider digital signatures, if the RDF is not disclosed to and considered by the signer.  I have observed no such provision of any current implementation of ODF digital signatures.  Furthermore, the blind retention of material that is not understood is not particularly welcome in the context of document security and privacy.  It also provides one rather easy means for establishing a covert or private channel in an otherwise innocent-seeming document, and that is also frowned upon in some circles.

Notice that I haven't even mentioned change tracking yet.  However, it seems to me that assuring RDF consistency is considerably more difficult than change tracking.  To presume this problem solved in order to have a platform for change-tracking seems like a trip down the wrong tunnel.

Furthermore, RDF is a poor vehicle for implementation of change-tracking because it breaks the abstraction that the RDF is presumably dealing with.  Using RDF to inject behavior into the document format is a sin because now some RDF has an active role in consumption and presentation of the document -- it has become essential to the interpretation of the format.  This now makes ignoring the RDF a difficult problem, and it is more difficult because there is no good way to know which RDF to pay attention to.  (There is essentially no constraint on the RDF that is carried with an ODF document.) Secondly, we have the problem, already, of how to track changed elements to which RDF relates and, for that matter, does that mean RDF is change-tracked too?  (I say it is madness to have that answer be anything but "no.")  The problem of maintaining consistency is also acute.  If an element having an xml:id is swept up into a tracked deletion, we have to presume that the xml:id value is potentially referred to by RDF that we might have no easy means to identify and we certainly might have no idea how to adjust it.   We might track the impact on RDFa, but basically there is no good mechanical procedure for comprehending what modifications of any RDF/XML part are related to what the deleted/modified text expresses.  

Finally, because there is RDFa in content.xml and RDF/XML in the package, that does not mean we have before us all of the RDF that refers to the document we have the file for.  (Technically, there should be a way to transform any of the RDF in an OpenDocument file or package into a free-standing RDF collection that exists apart from the document itself.  That is, after all, part of how the Semantic Web is intended to work.  This extracted RDF and other RDF of any origin whatsoever can be dispersed around the Worldwide Web and compiled into RDF collections in arbitrary places.)

If one were to make a custom use of RDF, in its own parts and under its own root element, to implement change tracking, this seems like a waste of time.  Making a custom XML component for that purpose is more direct and does not require extraordinary tooling.  Also, once RDF is used, there is always the problem of the admission of RDF which is not understood by the consumer.  So we end up reverting to the previously unsolved problem.  Not to mention the prolonged learning curve that one will have had to follow in order to understand that the trip is not worth it.

-----Original Message-----
From: Robin LaFontaine [mailto:robin.lafontaine@deltaxml.com] 
Sent: Thursday, May 12, 2011 02:40
To: office-collab@lists.oasis-open.org
Subject: Re: [office-collab] Using RDF for Change Tracking serialization?

I do not think this has been considered. Interesting idea.

Please clarify:

- you have shown how it applies to ac:change attributes, but presumably it could also be applied to the other GCT attributes as well? Then there would just be an ID and RDF referencing this ID and containing all the CT information

- presumably also RDF could be used to represent CT Sets and Stacks?

- you gain ability to query with SPARQL but the original XML could be queried with XQuery and XPath. I do not know the relative merits of these in this situation - any comments?

- if we want to define constraints, e.g. what constitutes a valid delete column change, would this be easier with CT in RDF or as XML?

- presumably some XML infrastructure in content.xml is still needed, for example markers for deleted items and the deleted item itself somewhere else in the document

Regarding your first aside about xml:id attributes - this is a big problem and the only practical solution I have seen is the simple one that requires applications to keep the IDs where possible (cut and paste does as you say require new IDs to be generated). Applications don't want to do that but the problem of matching up changed IDs is very complex and computationally expensive, so IMHO it is best to require that they are preserved. After all the rest of the XML needs to be retained, so why not the ID values? Perhaps the RDF itself could be used to preserve them??

Robin

[ ... ] 



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]