OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

office-collab message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Compromise on change-tracking proposals


Dear SC,

After a thorough analysis of the ongoing discussion regarding change-tracking (CT), I would like to suggest a compromise between the two existing proposals.

When comparing both existing proposals, I liked aspects of both: the generic approach of GCT and from ECT its emphasis on simplicity, its focus on extending existing functionality.
My goal was to combine above advantages and in addition allowing the merging of changes coming from different documents.
This will improve usability of ODF and allow the reuse of CT work for the upcoming real-time collaboration.

The solution is quite simple: Both proposal were trying to optimize the serialization of changes (change-tracking), without specifying in the first place what a change in ODF might be.
By first specifying the possible ODF changes in the ODF spec and later only referring to those changes from the ODF document, there is suddenly much room for improvement.

In this mail I will condense the work of months, starting with the refactoring the GCT approach:
As we know GCT allows the addition of annotations to GCT changes (e.g. move column). It helps ODF users/application to understand the semantic of the change. We might say that one type of annotation identifies a reoccurring change of ODF. We can reduce the complexity of the GCT proposal by splitting a GCT change into two parts. Instead of defining one XML change by GCT, we split the definition into an ODF change event and its related ODF XML change.
This allows the following design steps:
  1. In our ODF specification we will explain the ODF XML change of an ODF change event (with GCT).
  2. In the ODF document remains only a reference to the ODF change event of the spec (e.g. add-table, add-row). Together with necessary variable parameters (e.g. the start and end position of the column to be moved).
  3. The changes are moved out of the "content.xml" and "styles.xml" into an "undo.xml" file. This is to allow applications to load either the ODF content or the changes.
By taking step 1+2 we are able to move the quite verbose XML change from the document to the specification. Explaining the ODF change event only once in the specification.
By taking step 3 we only keep the final state of the document in the "content.xml" and "styles.xml" files. Add a new file for changes. Splitting responsibilities in files as done before ("content.xml" , "styles.xml", "meta.xml", "settings.xml").

But a second important concept, the concept of components, has to be introduced.
It is the definition of implicit known groupings of ODF XML (e.g. table, paragraph, cell, section, etc.).
A component is defined to be a disjunctive module of an ODF document. Disjunctive means changing the status of the component will only change the component itself, but it will indirectly change the status of the ancestor components. For example, deleting an image at root level will change the status of the overall document, but not of any other component as for instance a table in the end. A document can be seen as a tree of components.
(Note: There are not only components in a document, but there are also views. Those views are just mirroring the state of one or more components. Examples for this are the Content Table, which is a view of headings. Another example is a row & column, which is a view of table cells. Row & column are views and no components, as changing a row will change the status of a column as well.)
Having disjunctive components helps us to reduce the complexity of the collaboration. By dividing the document, we are dividing the problem as well and therefore diminishing it, see http://plugfest.opendocsociety.org/lib/exe/fetch.php?media=plugfests:201007_berlin:odfplugfest2011-sschubert.pdf - 164kb.
Components are necessary to identify the location of the change in the document. Only via components it is possible to reference a change that does not depend on XML details (e.g. XPath). Otherwise there would be the requirement of knowledge of XML during run-time. The relative reference is necessary for Operational Transformation (OT). OT will adapt the relative index of a referenced component when someone else has added or removed a preceding-sibling.
For example, if I want to delete the third row, but someone added a new row as second row, I still want to delete the same row, which is now the fourth.
For this reason components are vital for merging changes and real-time collaboration, which both depend on the usage of OT.

To summarize the design so far:
The ODF XML change is no longer part of the "content.xml" and "styles.xml" files. A change has been abstracted to change-events, hiding details of the ODF XML change.
The ODF change-events will be defined in the ODF specification, especially including the mapping to the ODF XML change (and vice versa) via GCT.
The idea is to serialize the change-events into a single file at document root level called "undo.xml" (adjacent to "content.xml" and "styles.xml").

Why saving the undo events and not the changes done?
To answer this, let me go back to the main difference between change-tracking and real-time collaboration events:
An ODF document can be seen as a frozen entity of a living document.  It reflects one possible state of all possible ODF states that a living document might have.
When a document is being loaded into an ODF application, it can be represented by any arbitrary internal model (e.g. web office, mobiles, etc.). This works as long as the document can be changed (as the user would usually expect) and as long as it is representing a valid ODF state.
Let us assume that an ODF application is only able to change the state of its ODF model - representing a valid document - by a change event.
Those change events can be seen as transitions from one valid status to another. For example by adding or deleting a character, a table, etc.
Real-time collaboration and change-tracking are both based on change events, but there is a difference.
The relation between real-time collaboration and change-tracking events is similar to the relation between a function and its inverse-function.
For example, when changing a background-color of a paragraph from green to red, the collaboration-event would be
"change background-color on paragraph XY to red".
The change-tracking event would be quite the opposite, similar to an "undo event" it would be
"change background-color on paragraph XY to green".
We can state that for change-tracking, the "undo event" of a change is mandatory (directed into the past), while collaboration requires the "do event" of a change (directed into the future).
By saving the "undo event", the original values before a change are kept.
There are now two ways to avoid redundancy:
  1. Saving the start document and the do changes or
  2. Saving the end document and the undo changes
With our existing "content.xml" and "styles.xml" the complete final state of the document is saved. Therefore it seems naturally to save the undo changes. As from "content.xml" and "styles.xml" and "undo.xml" all the previous document states can be recreated. Even all "do events" can be created from this information, by using the state of a component, the "undo event" and the previous state of the component.
Saving the "undo events" results most often into a smaller file than saving the "do events". This is because the most often occurring user scenario is adding new content "do event insert" , which includes a large amount of text and structure. By saving the "undo.xml", only the very small inverse "undo event delete" will be saved. The inserted text and structure is already in the "content.xml" and "styles.xml", therefore would be redundant to save.
An advantage is that change-tracking only uses "undo events".

Lets give an example of the "undo.xml":

Starting markup:

<text:p>123456789ABCDEFGH</text:p>

<text:p>123456789ABCDEFGH</text:p>


Ending markup:

<text:p>123456EFGH</text:p>

<text:p>123*some text*456789ABCDEFGH</text:p>


<changes>
    <delete s="1/7" e="1/13" />
    <insert s="2/4">*some text*</insert>
<changes>

Description:
The RelaxNG schema and explanation of all XML nodes of the "undo.xml" file (the resulting XML change) will be in a new part of our ODF specification.
The attributes s (start) and e (end) identify components. In case there is only the attribute "s", it is a single component. In case both attributes "s" & "e" are used, it is a selection of multiple components within the component tree (ordered as in XML).
The numbers represent the location of component in the document component tree. Even a single character are taken as a component.
XML based counting starts with 1. For example, the access of the 12th character within the 3rd paragraph of a text document is shown as "3/12". An insertion at this position would place the new component in front of the selected (e.g. 12th character).
(Note: I am uncertain yet if a preceding label specifying the type of component is helpful, e.g. "p3/c12" for 3rd paragraph/12th character. The prefix would be redundant, but is more human readable and might simplify the mapping into internal models, different to the ODF. The label would provide immediately the information without need to look-up in the "content.xml").
The sequence of the change-events is chronological. But change-events might be grouped by further <changes> elements, even creating a tree hierarchy.
For instance the changes of a document (e.g. our specification) can be sorted by an ODF errata version, which is separated by ISO national body issues, separated by issue numbers.

(Note: Let's say an ODF application supports a history feature, allowing to go back to previous versions. For example, we have a specification of ODF 1.2, including the "undo.xml" to ODF 1.1 and ODF 1.0 specifications. Now a user is able save the document as the previous version ODF 1.1, with less changes in the "undo.xml" (now only from ODF 1.1 to 1.0), but now as well with a "do.xml" file (containing the changes from ODF 1.1. to ODF 1.2) allowing to know how to get to ODF 1.2.  This would give the user the following options to choose from: 
  1. Changes made in the current working version (in this example ODF 1.1) will also affect following versions (ODF 1.2). For example, this would be handy for fixing typos, etc.
  2. Changes made in the current working version (in this example ODF 1.1) only affect the current version (ODF 1.1). For example, this would be handy to fix only the ODF 1.1. version, while keeping ODF 1.2 unchanged.
  3. Changes made in the current working version (in this example ODF 1.1), would create a new in between version (ODF 1.1.1). For example, this would be handy to provide a fixed version, while keeping ODF 1.1. and ODF 1.2 unchanged.
Finally I would like to mention the perhaps unexpected positive side effects we might have with above compromise:
  1. Changes made in real-time collaboration scenarios can use the same change-events as specified for change-tracking. The change-events provide an abstraction from the ODF XML details. Therefore ODF applications with different run-time models may use change-events as lingua franca. In addition changes made during off-line real-time collaboration are able to be saved same way as change-events in change-tracking via "content.xml", "styles.xml" and "undo.xml".
  2. By defining components and events, we would be able to create profiles of ODF. As we know now what to add or leave out of a profile and we can define more precisely the capabilities of an ODF application.
  3. Transformations to other formats (e.g. HTML) could rely the high-level abstraction of change-events. Therefore there is no longer a need to do the mapping to ODF XML. Or even more, if someone would provide HTML with a similar component tree and change-events, abstracting the HTML XML implementation details, the transformation is just connecting similar events (e.g. odf.insertTable to html.insertTable). More intresting for office applications, this would apply as well for OOXML!
  4. ODF would be easier to understand. It will be a catalyst for our ODF ecosystem.
Follow up: I will sent two mails with examples of some use cases and of some more advanced scenarios that I would like to share.

If you have any question or comments on the above, please feel free to contact me.

Regards,
Svante


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]