office-collab message

Subject: RE: [office-collab] Using RDF for Change Tracking serialization?
From: monkeyiq <monkeyiq@gmail.com>
To: dennis.hamilton@acm.org
Date: Sun, 15 May 2011 14:32:24 +1000
Thanks for your detailed reply! I tend to disagree with many of your
points though :(

On Thu, 2011-05-12 at 10:33 -0700, Dennis E. Hamilton wrote:
> SHORT ANSWER
> 
> It is unacceptable to use the existing RDF provisions of ODF to
> incorporate the data structures involved in change tracking.

Does everybody on the list agree on this point? What is the procedure
for ruling it out or not?

> 
>  - Dennis
> 
> LONG ANSWER
> 
> There are those of us who believe that RDF is meant to be descriptive
> and to provide metadata that can be used with regard to the
> *semantics* of documents.  In that case, RDF is unidirectional.  It
> expresses things about what the document (not so much the XML or the
> format) expresses but it does not impact what that expression is in
> any way.
> 
> Since the way RDF is expressed requires pointing at aspects of a
> document, this does mean, for mechanical reasons, that it does so by
> pointing at fragments in the document format.   Presumably these
> fragments serve as sufficient connection to the document content (when
> rendered for human consumption) that is what the RDF is interpreted as
> carrying assertions about.
> 
> This does leave a rather interesting conundrum.
> 
> (1) It should be possible to consume and present a document without
> consideration of the RDF whatsoever, whether in the package-carried
> RDF/XML files and some files called manifest.rdf or embedded in the
> content.xml file as RDFa.  (There is an ODF element that breaks this
> case, but that is a minor blemish so far.)
> 
> (2) Producing an ODF document is complicated by the fact that there is
> no way of determining, in a more-or-less pure producer, whether
> changes to the document's content as represented in ODF format to
> cause the RDFa or the separately packaged RDF/XML to now be
> inconsistent.  (Think maintenance of software that does not change the
> relevant comments.)
> 
> While it is recommended that RDF be retained, there is no reliable way
> of knowing whether or not there is now an inconsistency.  All one can
> really do is attempt to preserve any xml:id already in the content.xml
> and not generate any duplicates. (This preserves reference to
> fragments of the document by RDF IRIs but won't assure much if the
> access is by XPATH or some other means of reference.)  If some action
> has the effect of deleting an element having an xml:id, that's just
> unfortunate.  Clearly, RDFa can be deleted along with the elements to
> which it is attached.  It is less clear what can be done if the
> element is modified and there is no means to determine whether the
> result remains consistent with any associated RDFa. 
> 
> So there is no good way to assure consistency, nor is there any
> provision whatsoever in the ODF specification for how consistency is
> maintainable.  The automatic retention of RDF found in the consumption
> of a document leads to some serious consequences when we consider
> digital signatures, if the RDF is not disclosed to and considered by
> the signer.  I have observed no such provision of any current
> implementation of ODF digital signatures.  Furthermore, the blind
> retention of material that is not understood is not particularly
> welcome in the context of document security and privacy.  It also
> provides one rather easy means for establishing a covert or private
> channel in an otherwise innocent-seeming document, and that is also
> frowned upon in some circles.
> 
> Notice that I haven't even mentioned change tracking yet.  However, it
> seems to me that assuring RDF consistency is considerably more
> difficult than change tracking.  To presume this problem solved in
> order to have a platform for change-tracking seems like a trip down
> the wrong tunnel.
> 
> Furthermore, RDF is a poor vehicle for implementation of
> change-tracking because it breaks the abstraction that the RDF is
> presumably dealing with.  Using RDF to inject behavior into the
> document format is a sin because now some RDF has an active role in
> consumption and presentation of the document -- it has become
> essential to the interpretation of the format.  This now makes
> ignoring the RDF a difficult problem, and it is more difficult because
> there is no good way to know which RDF to pay attention to.  (There is
> essentially no constraint on the RDF that is carried with an ODF
> document.) Secondly, we have the problem, already, of how to track
> changed elements to which RDF relates and, for that matter, does that
> mean RDF is change-tracked too?  (I say it is madness to have that
> answer be anything but "no.")  

I say it is madness to say anything other than absolutely yes. It seems
like a weakness to only track the content and style of a document but
not it's semantics.

I had in mind to add provisions for RDF change tracking as an unofficial
extension in one or two implementations at some point. But since I'm in
this group now it seems like a wise move to try to table it here.
Perhaps I'm wrong or just an optimist to consider it. 

Thinking a bit laterally, it seems that fields like deductive databases
have to deal with things that feel much like revisioning of triples
already. For example a ddb performing forward chaining must track
something relating to the assertion and inference rule that led to
triple generation. If it doesn't do this then retraction becomes an
almost intractable problem (save for wiping all inference and starting
forward inference from base triples).

A few obvious ideas come to mind to implement ct of RDF; (a) using the
context node, (b) using specific RDF/XML files per revision as
incremental files, (c) or reifying document RDF and associating triples
with their introduction / retraction revisions.

For (b) one might have for a revision 7 a file rdf7.rdf where to load
revision 7 one would call
load( n ) = 
  if(!n) return {} 
  else read-triples(rdfn.rdf) union-with-retraction-handling load(n-1)

Of course this is just a rough high level thought. The downside to (a)
is that you then remove the ability for applications to use the RDF
context node which is far from optimal IMHO.

The downside to (c) is that it is a bit bloatey. Note that for (c) I
wouldn't expect RDF producers / consumers to reify everything. I have in
mind for the RDF handling code in the application itself to perform this
behind the scenes. The graph offered by the application doesn't have to
be exactly the graph on disk.

Obviously it would be nice to be able to quickly deserialize the RDF
graph (with retractions applied) in bulk for the last X versions. which
(b) would need non incremental as well as incremental rdfN files for
such to work.

Does anyone else think CT on the RDF is an interesting and valuable
idea?

> The problem of maintaining consistency is also acute.  If an element
> having an xml:id is swept up into a tracked deletion, we have to
> presume that the xml:id value is potentially referred to by RDF that
> we might have no easy means to identify and we certainly might have no
> idea how to adjust it.   We might track the impact on RDFa, but
> basically there is no good mechanical procedure for comprehending what
> modifications of any RDF/XML part are related to what the
> deleted/modified text expresses.  
> 
> Finally, because there is RDFa in content.xml and RDF/XML in the
> package, that does not mean we have before us all of the RDF that
> refers to the document we have the file for.  (Technically, there
> should be a way to transform any of the RDF in an OpenDocument file or
> package into a free-standing RDF collection that exists apart from the
> document itself.  That is, after all, part of how the Semantic Web is
> intended to work.  This extracted RDF and other RDF of any origin
> whatsoever can be dispersed around the Worldwide Web and compiled into
> RDF collections in arbitrary places.)
> 
> If one were to make a custom use of RDF, in its own parts and under
> its own root element, to implement change tracking, this seems like a
> waste of time.  Making a custom XML component for that purpose is more
> direct and does not require extraordinary tooling.  Also, once RDF is
> used, there is always the problem of the admission of RDF which is not
> understood by the consumer.  So we end up reverting to the previously
> unsolved problem.  Not to mention the prolonged learning curve that
> one will have had to follow in order to understand that the trip is
> not worth it.
> 
> -----Original Message-----
> From: Robin LaFontaine [mailto:robin.lafontaine@deltaxml.com] 
> Sent: Thursday, May 12, 2011 02:40
> To: office-collab@lists.oasis-open.org
> Subject: Re: [office-collab] Using RDF for Change Tracking
> serialization?
> 
> I do not think this has been considered. Interesting idea.
> 
> Please clarify:
> 
> - you have shown how it applies to ac:change attributes, but
> presumably it could also be applied to the other GCT attributes as
> well? Then there would just be an ID and RDF referencing this ID and
> containing all the CT information
> 
> - presumably also RDF could be used to represent CT Sets and Stacks?
> 
> - you gain ability to query with SPARQL but the original XML could be
> queried with XQuery and XPath. I do not know the relative merits of
> these in this situation - any comments?
> 
> - if we want to define constraints, e.g. what constitutes a valid
> delete column change, would this be easier with CT in RDF or as XML?
> 
> - presumably some XML infrastructure in content.xml is still needed,
> for example markers for deleted items and the deleted item itself
> somewhere else in the document
> 
> Regarding your first aside about xml:id attributes - this is a big
> problem and the only practical solution I have seen is the simple one
> that requires applications to keep the IDs where possible (cut and
> paste does as you say require new IDs to be generated). Applications
> don't want to do that but the problem of matching up changed IDs is
> very complex and computationally expensive, so IMHO it is best to
> require that they are preserved. After all the rest of the XML needs
> to be retained, so why not the ID values? Perhaps the RDF itself could
> be used to preserve them??
> 
> Robin
> 
> [ ... ] 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe from this mail list, you must leave the OASIS TC that
> generates this mail.  Follow this link to all your TCs in OASIS at:
> https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php 
>
Follow-Ups:
- RE: [office-collab] Using RDF for Change Tracking serialization?
  - From: robert_weir@us.ibm.com
References:
- Using RDF for Change Tracking serialization?
  - From: monkeyiq <monkeyiq@gmail.com>
- Re: [office-collab] Using RDF for Change Tracking serialization?
  - From: Robin LaFontaine <robin.lafontaine@deltaxml.com>
- RE: [office-collab] Using RDF for Change Tracking serialization?
  - From: "Dennis E. Hamilton" <dennis.hamilton@acm.org>