office-collab message

Subject: Re: [office-collab] Re: Counting in Access Paths
From: Jos van den Oever <jos@vandenoever.info>
To: office-collab@lists.oasis-open.org
Date: Mon, 22 Oct 2012 17:55:12 +0200
On Sunday 21 October 2012 20:24:36 PM Dennis E. Hamilton wrote:
> Let's get back to the subject at hand.
> 
>   1. The use of one-way absolute (i.e., position-rigid) hierarchical
> references are brittle and cannot survive processing that alters such
> paths without requiring any processing of the markup (at the markup level)
> to find and adjust the references, delete the references, or leave the
> document in a damaged condition.
> 
>   2. The use of unique ID values in a bidirectionally-verifiable linkage
> does not have those problems and is a technique already required of ODF
> processors.
> 
>   3. I don't believe (1) is *essential* to MCT.  If it is essential, I
> would say that MCT is fatally flawed.
> 
>   4. What is the technical barrier to using a mechanism like (2) for MCT? 
> Is it so objectionable that there are empty elements that serve as markers
> in the texts that have tracked changes?
> 
>   5. The making of untracked changes in content that is subject to a
> tracked change is a problem in all approaches.  It seems that (2) is safer
> and there may be further safeguards so that a change is not misattributed.
>  That's different than having the document be broken, as would happen with
> (1).
> 
> I am unclear what the attachment to absolute positional offsets comes from.
>  Is it really essential?

So far I have been partial to absolute positioning. This partiality probably 
originates in my reading of the Google Wave documentation and my having a 
tendency to avoid being verbose in serialization. I have an open mind to the 
idea of using xml:id though. It certainly could make CT more robust.

The absolute positions would indeed require updating when inserting nodes 
higher in the hierarchy, when moving nodes and when removing nodes if these 
addresses were used in places other than change tracking. The references to 
labelled nodes would only need updating when nodes are removed, at which point 
the there should not be any references left anyway.

The references as used in change tracking only apply to one version of the 
document. To have them point to the same node in a different document, the 
change operation would be updated (transformed) to point the correct node.

The addressing should support very easy updating of addresses in change 
operation transformations. In Google Wave, the addresses are one-dimensional. 
i.e. not hierarchical and are easy to update. The hierarchical addresses in 
the current MCT seem more laborious at first glance. Care should be taken that 
the addresses can be transformed without information about the document.

The rule to update the addresses in any of the operations seems often to only 
involve changing the last number if both change operations point to an equally 
deeply nested address. (The hierarchical addressing rules depend on the @type 
attribute which I have not fully grasped yet. For example how would a 
paragraph in a cell in a table in a frame in a paragraph be addressed?)

When using xml:id, the rules for transforming change operations are less 
obvious to me. Let's take the example similar to the one in the MCT-Merge-
enabled-Change-Tracking-wd05 presentation:

 <do>
  <add type=”paragraph” s="/1">Aruba</add>
  <add type=”paragraph” s="/2">Curaçao</add>
  <add type=”paragraph” s="/2">Bonaire</add>
 </do>

This would result in:
 <p>Aruba</p>
 <p>Bonaire</p>
 <p>Curaçao</p>

The position in which the paragraphs are added is clear. How would these rules 
look when xml:id is used instead of positions? I could imagine something like 
this:
 <do>
  <add type=”paragraph” identifier="a">Aruba</add>
  <add type=”paragraph” identifier="c" after="#a">Curaçao</add>
  <add type=”paragraph” identifier="b" after="#a">Bonaire</add>
 </do>

An advantage of addressing by position is that it would allow change 
operations to be transformed without any document information. Addressing with 
xml:id's does not have that advantage. Here is an example that starts from a 
document (no namespaces for brevity):  
 <p xml:id="a">
  <frame xml:id="b">
   <text-box xml:id="c">
    <p xml:id="d">hello</p>
   </text-box>
  </frame>
 </p>

The changes are: add a paragraph after the paragraph with 'hello' and delete 
the paragraph with the frame.
 <do>
  <add type="paragraph" s="/1/1/1/2">world</add>
  <del type="paragraph" s="/1"/>
 </do>

Swapping the two change operations would make one redunant, leaving just
 <do>
  <del type="paragraph" s="/1"/>
 </do>

When using xml:id, it is not clear from the addresses alone, how the change 
operations should be transformed. The original set of change would be describe 
in this way:

 <do>
  <add type="paragraph" identifier="e" after="#d"/>
  <del s="#a"/>
 </do>
(Presumably the @type attribute would be redundant in a <del/> operation.)

It is not clear from the element identifiers, that the <add/> operation would 
be nullified by the <del/> operation. Information about the document hierarchy 
would needed to see this.

The position based addresses are only valid in the point they take up in the 
list of changes. If change operations are moved to a different position in 
that list, they the addresses may change. The use of the addresses is only to 
describe the location of the change.

One could argue that an advantage of the xml:id would be that you can quickly 
find all changes that are applied to a certain paragraph by looking at the 
identifier mentioned in the change. But that is actually not true. If a range 
of paragraphs is deleted, the identifier will not be mentioned. If a span in 
the paragraph is modified, it will not be seen.

On the other hand with positional addressing, one can find the list of 
operations on one paragraph quickly. When searching for it, one needs to go 
through the list of changes and update the address that you are looking for 
while moving through the list.

> 	From: office-collab@lists.oasis-open.org
> [mailto:office-collab@lists.oasis-open.org] On Behalf Of Robin LaFontaine
> Sent: Thursday, October 18, 2012 2:05 AM
> 	To: office-collab@lists.oasis-open.org
> 	Subject: Re: [office-collab] Re: Counting in Access Paths
> 
> 	I agree with your concerns, Dennis, and am surprised implementors do not
> seem to worry about this issue. OT has, we are told, been proved to work
> in a forwards direction when by definition all the 'edits' will be tracked
> and executed. Working backwards (from existing to previous version, i.e.
> tracked change) when not all the changes will be tracked, and not all
> accepted, is a different problem, IMHO. However, it is simple enough
> (though it will need a good bit of effort) to demonstrate whether or not
> this is a real concern - we need a specification of how it works and some
> implementations!

With 'working backwards', I assume you mean 'finding a changeset that 
transforms one document into another document'. I think it would be possible 
to find such a set of changes for MCT too. MCT normally works by recording the 
actual edit operations, so when reconstructing changes from two documents the 
changes are not actual edits but inferred edits. Nevertheless, inferring a 
list of changes would be possible with MCT too.

> 	On 17/10/2012 21:38, Dennis E. Hamilton wrote:
> 
> 		I'm having a terminology problem here.
> 
> 		As far as I'm concerned, s="/2/10" and e="/3/18" are absolute 
addresses. 
> Hierarchical, but absolute, not unlike in absolute URLs.  It's more
> brittle than in a URL because it is based on counted position, not on
> labeled hierarchy nodes.  That means insertion and deletion of siblings at
> every level requires these references to be repaired.  That's scary.

The references are used only in the current change operation. They should not 
occur in other places.

> 		Does not the MCT use of these rigid absolute paths require that the
> document be serialized before the tracking information can be serialized?

This depends on the meaning of the numbers in the absolute paths. If the 
numbers refer to the XML nodes, then yes. If they refer to ODF concepts like 
paragraphs and tables, then this would not be needed, but a thorough 
documentation of the addressing method would be needed. This is still some 
work and I'd like to read up on the latest document that does this. What/where 
is the currently latest version?

> And for a consumer presenting the content, is it necessary to find the
> relevant change-tracking information by some synchronization method?  My
> concern is that it is impossible to detect when synchronization has been
> broken.  Everything has to be absolutely perfect and there is no way to
> touch a change-tracked document without having to adjust all of the
> tracking locations.  That's a considerable burden.

The change operations are only valid for one version of the document. If the 
document is modified, the changes do not apply any more at all. To make them 
apply, the changes between the document for which they apply and the document 
that is the result of more user editing would need to be inferred. So if a 
user sends a revision of a document without recording the changes and reports 
this document to the be merged with the last version for which the changes 
were recorded, then on could create a new version and attribute changes.

> 		I am only addressing the cross-identification approach here.  It 
appears
> that ODF CT does this in a more robust way; I don't see why MCT can't be
> made at least as resilient.
> 
> 		I think there does need to be a stretchy way to connect between 
tracked
> details and the point of change.  An xml:id ID value and a corresponding
> IDREF attribute value do this perfectly for XML-modeled persistent
> document formats.  And this kind of support already has to exist in
> ODF-based processors simply because of the many ways that
> cross-referencing is handled by identifiers of some type, including IRIs
> that refer into package parts by fragment identifiers.

Could you explain what the cross-identification approach is?

Well, that was a long mail. I thought the meeting was today and had time 
planned which allowed me to write so much.

Cheers,
Jos