OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

office-collab message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: RE: [office-collab] Re: Counting in Access Paths


I'm having a terminology problem here.

As far as I'm concerned, s="/2/10" and e="/3/18" are absolute addresses.  Hierarchical, but absolute, not unlike in absolute URLs.  It's more brittle than in a URL because it is based on counted position, not on labeled hierarchy nodes.  That means insertion and deletion of siblings at every level requires these references to be repaired.  That's scary.

On the other hand, to use an existing example, xml:id="ct270478544" establishes a *fixed* identifier, but it does not rely on fixed locations.  So wherever a text:change-id="ct270478544" happens to refer to it, the connection is maintained no matter what kind of movement happens on the paths between the two.

It is true that in the specific case of the ODF 1.2 use of this device, the <text:changed-region> element is referenced from the content where it is applicable, not the reverse.  That seems like a good choice to me, considering that it is the reference from the content that makes examination of the <text:changed-region> of any concern.  This also allows the <text:changed-region> material to be output in front of the content, so it can all be indexed before the in-text references occur in the datastream.  

Does not the MCT use of these rigid absolute paths require that the document be serialized before the tracking information can be serialized?  And for a consumer presenting the content, is it necessary to find the relevant change-tracking information by some synchronization method?  My concern is that it is impossible to detect when synchronization has been broken.  Everything has to be absolutely perfect and there is no way to touch a change-tracked document without having to adjust all of the tracking locations.  That's a considerable burden.

I am only addressing the cross-identification approach here.  It appears that ODF CT does this in a more robust way; I don't see why MCT can't be made at least as resilient.  

I think there does need to be a stretchy way to connect between tracked details and the point of change.  An xml:id ID value and a corresponding IDREF attribute value do this perfectly for XML-modeled persistent document formats.  And this kind of support already has to exist in ODF-based processors simply because of the many ways that cross-referencing is handled by identifiers of some type, including IRIs that refer into package parts by fragment identifiers.  

 - Dennis

-----Original Message-----
From: office-collab@lists.oasis-open.org [mailto:office-collab@lists.oasis-open.org] On Behalf Of Svante Schubert
Sent: Wednesday, October 17, 2012 09:21
To: office-collab@lists.oasis-open.org
Subject: [office-collab] Re: Counting in Access Paths

On 16.10.2012 22:10, Dennis E. Hamilton wrote:
> This is a separate topic, although it certainly figures in how various interop challenges will be handled.
>
> It occurs to me that counting has some disconcerting consequences.
>
> I mean this kind of thing:
>
>      <del s="/2/10" e="/3/18" />
>      <merge s="/2" e="/3" />
>
> There are some interesting consequences:
>
>  1. It is nearly impossible to do these manually (as when fabricating tests, examples, etc.).  That goes to creating them and also checking them manually in forensic work.
Basically the path language is a simplification of an W3C XPath path and
I never heard someone complaining about this to be forensic work. Was
quite a success if I remember it.
Made even a instant empiric sanity check and asked randomly some people
here at the ODF Plugfest in Berlin and they very much do liked it and no
negative comment or even a complain about it at all.
>  2. Is there a presumed canonicalization when it comes down to counting in text content? And how are component elements
>     counted?  I assume they count as 1.
Yes, every component count as one, as every text character is a
component they are counted accordingly. Original Operational
Transformation counts the gaps between text starting with zero. In
addition computer science usually start counting with 0, but XML counts
the components and start with 1 and human languages start with 1 - not
with 0 - and as the serialization is in XML and meant to be readable by
humans it starts with 1. Read an operation <add type="paragraph" s="/3"
/> add a paragraph at the 3rd position. 
>  3. This seems to be very brittle.  That is, anything that is done by some tracking-negligent tool that changes the offset of material that is touched by tracking will completely break the tracked change, with no detection that it happened.  In the ODF scheme, as one example, there is much more resilience in the making of alterations that are unrelated to the change.  Even when there is some sort of collision, there is more information that may help resolve it, or at least determine that some of the tracking cannot be relied upon any longer.  That it may not be possible for a consumer to even detect the disconnect is worrisome.
Quite the opposite, it is far from optimal to use an (stable) absolute
positioning either via ID or (even more stable) directly nesting of
changes into the content. By doing so, it would be necessary to read the
complete content before identifying the changes.
The efficiency/time of merges would relate to the document size and it
would not be possible to sent only the changes of a document to someone
else, who is working as well on the same document.
Not to mention that merges are based on OT, which require relative
referencing. With relative references someone might even be able to
edit/document (proposed changes) on a read-only ODF document, might it
be a signed ODF document, or a document on a web-server somewhere.
Change-Tracking similar to ODF signature & encryption can only be used
by applications supporting it. No need to avoid an advanced technology
if it can be destroyed by a text editor. The user, who edits ODF via
text editor should know what is going on.
Still similar to the revision systems (e.g. git), we might want use
signatures (e.g. SHA-1) to verify if a stored (XML) file was changed.
>
> This is not an objection to MCT in principle, it is simply an objection to the difficulty and the apparent lack of resilience in the scheme by which the tracking is connected to the text that it applies to.  One can also argue that this is not in the spirit of XML-based models at all.
Well, you might be right that operations are not fully representing the
spirit of the XML-based model, but on the other hand operations are
representing the spirit of distributed work. Sooner or later ODF
applications need to solve real-time collaboration, merges by advanced
techniques as Operational Transformation. XML is unfortunately not alone
the hammer for this nail. Nevertheless we are still serializing the
operations into XML. I do very much like XML, but we need to use a
technology where it is suited for.
And difficult? The LibreOffice developers listening to my MCT
presentation, were quite excited. Even Michael Stahl - who implemented
RDF Metadata in OpenOffice and is always very skeptical - told me
afterwards, that this might work!
As long the implementers like it, I am happy.

Interesting view you are representing, Dennis.
Svante
>
>  - Dennis
>
> -----Original Message-----
> From: office-collab@lists.oasis-open.org [mailto:office-collab@lists.oasis-open.org] On Behalf Of Svante Schubert
> Sent: Tuesday, October 16, 2012 03:04
> To: dennis.hamilton@acm.org
> Cc: office-collab@lists.oasis-open.org
> Subject: Re: [office-collab] Paragraph merge in ODF (earlier - Re: [office-collab] FW: [office] Groups - MCT Challenge #1 Documents (Zip) uploaded)
>
> [ ... ]
>
> The heading is the first component, you start delete text from the second to the third component, which serialized MCT operations for the change of your challenge might be:
> <del s="/2/10" e="/3/18" />
> <merge s="/2" e="/3" /> 
> The above are NOT the undo operations, but the operations that describe your change. The undo will follow as soon we agree on what is being changed in the XML and we (or I) need to think over how to handle styles in general (I will be on the ODF plugfest tomorrow and LibreOffice conference after, so I might have to pause this thread till next week).
>
> I even would omit the second parameter for the merge as only sibling paragraphs can be merged.  
>
> [ ... ]
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: office-collab-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: office-collab-help@lists.oasis-open.org



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]