office-collab message

Subject: Re: [office-collab] Using RDF for Change Tracking serialization?

From: monkeyiq <monkeyiq@gmail.com>
To: Robin LaFontaine <robin.lafontaine@deltaxml.com>
Date: Fri, 13 May 2011 18:00:18 +1000

On Thu, 2011-05-12 at 10:39 +0100, Robin LaFontaine wrote:

I do not think this has been considered. Interesting idea.

Please clarify:

- you have shown how it applies to ac:change attributes, but presumably it could also be applied to the other GCT attributes as well? Then there would just be an ID and RDF referencing this ID and containing all the CT information

Looking over a few other examples one might get the following. There might be issues in there as I sort of winged it creating them. Though I still need to think on the examples that move / delete content. The final example does a bit of that and one may end up explicitly citing the ODF XML types per revision in RDF.

6.1.2
ORIGINAL
<text:p delta:insertion-type="insert-with-content"
delta:insertion-change-idref='ct1234'>
This paragraph is inserted.</text:p>

WITH RDF
The content.xml
<text:p xml:id="n001">This paragraph is inserted.</text:p>
The RDF
n001 delta:insertion-type          insert-with-content
n001 delta:insertion-change-idref ct1234

6.3.2
ORIGINAL
<text:p>
This text will be made
<text:span
delta:insertion-type='insert-around-content'
delta:insertion-change-idref='ct1234'
text:style-name="bold-style">bold</text:span>.
</text:p>

WITH RDF
The content.xml
<text:p>
This text will be made
<text:span xml:id="n002" text:style-name="bold-style">bold</text:span>.
</text:p>
The RDF
n002 delta:insertion-type         insert-around-content
n002 delta:insertion-change-idref ct1234

6.5.2
ORIGINAL
<text:p split:split01='sp1'>
This paragraph will be split into two.
</text:p>
<text:p delta:insertion-type='split'
        delta:insertion-change-idref='ct1'
        delta:split-id='sp1'>
This will be in the second paragraph.
</text:p>

WITH RDF
The content.xml
<text:p xml:id="n005">
This paragraph will be split into two.
</text:p>
<text:p xml:id="n006">
This will be in the second paragraph.
</text:p>
The RDF
n005 split:split01                 sp1
n006 delta:insertion-type          split
n006 delta:insertion-change-idref ct1
n006 delta:split-id                sp1

6.11.2
ORIGINAL
<text:p>
How text is <delta:inserted-text-start delta:inserted-text-id="it632507360"
delta:insertion-change-idref= ct1 />very easily
<delta:inserted-text-end delta:inserted-text-idref="it632507360"/>added.
</text:p>

WITH RDF
The content.xml
<text:p>
How text is <delta:inserted-text-start xml:id="n007"/>very easily
<delta:inserted-text-end xml:id="n008"/>added.
</text:p>

The RDF
n007 delta:inserted-text-id="it632507360"
n007 delta:insertion-change-idref="ct1"
n007 ends-at n008

6.13.2
ORIGINAL
<delta:remove-leaving-content-start delta:removal-change-idref='ct1234'
delta:end-element-idref='ee888'>
    <text:p text:style-name="Text_20_body"
</delta:remove-leaving-content-start>
<text:h text:style-name="Heading_20_1"
        text:outline-level="1"
        delta:insertion-type='insert-around-content'
        delta:insertion-change-idref='ct1234'>
What are the ground rules?
</text:h>
<delta:remove-leaving-content-end delta:end-element-id='ee888'/>

WITH RDF
The content.xml
<text:h xml:id="n010"
        text:style-name="Heading_20_1"
        text:outline-level="1">
What are the ground rules?
</text:h>

The RDF
n010   has-revision                       ct1234
ct1234 element-type                       text:p
ct1234 text:style-name                    Text_20_body
n010   delta:insertion-type               insert-around-content
n010   delta:insertion-change-idref       ct1234

- presumably also RDF could be used to represent CT Sets and Stacks?

I think this would be a really great use for RDF. The delta:tracked-changes tree could be made RDF and then possibilities like location, foaf etc immediately present themselves. For example the below cites a person and location for a change, and also allows the software to explicitly cite another location for a change. This sort of thing could be extremely useful for companies where it might be desired to know if a change was performed while travelling, at home, or at the office. Perhaps such information would be used when accepting or assessing changes. Folks tending to be less alert while hacking text over the Pacific.

If nothing else, I think putting this data into RDF/XML would be a wonderful thing. Links made are via the ct1 style numbers, so the identifiers should not suffer the same issues as for xml:id values. Copy and paste of delta:change-transaction shouldn't happen via the office app either like it might for a text:p with an xml:id.

OLD
<delta:change-transaction delta:change-id="ct1">
   <delta:change-info>
     <dc:creator>Robin</dc:creator>
     <dc:date>2010-06-02T15:48:00</dc:date>
   </delta:change-info>
</delta:change-transaction>

NEW
<uri:robin> <http://xmlns.com/foaf/0.1/name> "Robin"
<uri:robin> <http://xmlns.com/foaf/0.1/homepage> <http://robin.deltaxml.com/>
<uri:robin> <http://xmlns.com/foaf/0.1/based_near> _:genid1
<uri:robin> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Person>
_:genid1 <http://www.w3.org/2003/01/geo/wgs84_pos#lat>   "51.47026"
_:genid1 <http://www.w3.org/2003/01/geo/wgs84_pos#long> "-2.59466"
delta:change-transaction   delta:change-id   ct1
ct1                        dc:creator        uri:robin
ct1                        dc:date           2010-06-02T15:48:00
ct1                        performed-at      _:genid2
_:genid1 <http://www.w3.org/2003/01/geo/wgs84_pos#lat>   "15.47026"
_:genid1 <http://www.w3.org/2003/01/geo/wgs84_pos#long> "12.59466"

- you gain ability to query with SPARQL but the original XML could be queried with XQuery and XPath. I do not know the relative merits of these in this situation - any comments?

I can't really say either way. One thing that comes to mind is that when an application wants to get value out of RDF it really wants to have SPARQL capabilities. Activities such as finding foaf data linked to a text:p is much simpler as a SPARQL. On the other hand an office app might not link to xqilla etc because it focuses more on load/save of ODF rather than runtime queries of it.

But I've not looked at which of XQuery / SPARQL would give better value when querying this sort of data.

- if we want to define constraints, e.g. what constitutes a valid delete column change, would this be easier with CT in RDF or as XML?

I'll leave this one for now. An RDFS/OWL vs XSD/RelaxNG comparison would be interesting...but perhaps more of a whole arvo activity.

- presumably some XML infrastructure in content.xml is still needed, for example markers for deleted items and the deleted item itself somewhere else in the document

Yes, the keen eye will easily notice that I left the deletion examples alone :/

Regarding your first aside about xml:id attributes - this is a big problem and the only practical solution I have seen is the simple one that requires applications to keep the IDs where possible (cut and paste does as you say require new IDs to be generated). Applications don't want to do that but the problem of matching up changed IDs is very complex and computationally expensive, so IMHO it is best to require that they are preserved. After all the rest of the XML needs to be retained, so why not the ID values? Perhaps the RDF itself could be used to preserve them??

Unfortunately the RDF can't really preserve the xml:id values because they are the link from the RDF graph to the content.xml. The RDF could remember what the xml:id was, but if an app then writes a text:p with a new xml:id there isn't really a way to know that the new value replaces the old.
I guess one could try to infer it from the context, but that would indeed be hideously complex.

References:
- Using RDF for Change Tracking serialization?
  - From: monkeyiq <monkeyiq@gmail.com>
- Re: [office-collab] Using RDF for Change Tracking serialization?
  - From: Robin LaFontaine <robin.lafontaine@deltaxml.com>