office-comment message

Subject: Requirements

From: ronnie thebonnie <ronniethebonnie@gmail.com>
To: office-comment@lists.oasis-open.org
Date: Sat, 18 Jul 2009 15:06:28 +0200

+NAME
Ronnie Thebonnie

+CONTACT
ronniethebonnie@gmail.com

+CATEGORY (select one or more from below)
versioning, change tracking

+SCOPE (select one or more from below)
general/packaging

+USE CASE
all cases where versioning or change tracking is wanted

+DESCRIPTION

Adding Infrastructure and general approach in ODF for the purpose advanced change tracking in ODF files.

Preserving backwards compatibility,
presenting two different mechanisms one simpler one more advanced.

This idea came after reading this post about tracked changes:
http://blogs.msdn.com/dmahugh/archive/2009/05/13/tracked-changes.aspx
It seems that ODF could use some better change tracking mechanisms.

General infrastructure:

The infrastructure for change tracking should be separate from the document content.

In every file's root folder, there needs to be a folder called change tracking.
This folder will serve for all change tracking related stuff.
It could hold whole versions of a document or, and repositories for advanced change tracking.
The folder is, MUST NOT be a repository itself!
But is meant to contain repositories in sub-folders.
If the user chooses a name for that repository when it's created, that exact same name must be used for the folder or file name of that repository.

The change tracking folder MUST hold a meta data file with information about all change tracking sub-folders and files present. I will call this file the change tracking meta data file for now on and use an abbreviation: ctmdf.
It must hold the names, dates, applications (name + version + flavour, fork if any) and relations.
(The relations between files in the change tracking folder are very, very important.)

This gives the needed flexibility for advanced stuff like multiple not-subsequent versions of the same document. It also creates a less conflict/error-prone way to manage the versions and repositories because there is only one documented format file that holds the relations.
The contents of the change tracking meta data file must be standardized in the ODF format.

There needs to be meta data for subsequent versions: versions evolving in time,
and concurrent versions: versions not evolving in time but in different places at the same time.
Names, version numbers, the starting and finishing times for all the versions and repositories present in that folder and also keep track of the authors and applications (name + version + flavour if there are) of the repositories, different versions in one place.
Multiple applications can put their own change tracking files in and don't interfere with each other. Because the document contents is not affected in any way.

e.g. Someone could work on Google docs, download it. upload it to someone else who works with it on Google docs and preserve all the revisions. The document could be edited by someone else with OpenOffice.org and passed along to someone who uses Google docs.

The applications could, when missing changes as in this example.
Use the document itself to add the missing changes to their change tracking mechanism.
This can be done completely transparent to the user.
In this example Google docs could add a new revision.
It can add the changes to it's revision system by comparing the output of it's own revisions to the document.

The backwards compatibility is not affected by all of this.
Because of the separation of change tracking and the actual document in the file format.
The applications that don't support the folder at all can just ignore it and still access the whole file, document content without problems. (This is the worst case scenario.)

Case 1: Complete change tracking with complete version control systems like svn or other ways e.g. Google docs with revisions.

There is a complete change tracking mechanism present.
Here the version control system saves everything in a file or sub-folder in the change tracking folder. (The change tracking folder will hold repositories but won't be one itself. )
The application saves the file, the file's contents and saves the change tracking in the appropriate place and adds the meta data in the ctmdf.

About naming, there are a few best practices to add for making able to handle the repositories application independent as files:
when a user saves a version or starts a repository, revisions.
The user usually adds a name for it.
If the user doesn't adds a name, the application MUST add a unique name for it.
It's this name that must be used for the file/sub-folder repository in the tracked changes folder.
It must also be the name in the ctmdf for referring to that repository.

There is, with this approach, a question.
What version control system should be recommended, added as a standard?
Well there must not be one favoured in the ODF specification.
Because the fact that it's out of scope, because ODF is not a tracking change specification and it shouldn't be trying to add/be one.

The version control systems are changing at a rapid pace and it would be bad to add something of this complexity to ODF itself.
The best solution can change over time, and will probably not the best for everyone.
Recommending, specifying one is dangerous for ODF and MUST be avoided in ODF itself.
The discussion about the best solution will result in a never-ending flame war about which one is the best.

OpenOffice.org can use a standard version control system such as e.g. svn as a reference which other applications could then use, support too. Because OpenOffice.org is the reference implementation for ODF.

In fact there is already an adding that does this.
It is a project to make OpenOffice.org track changes with svn in ODF (works with all kinds of ODF files):
http://sourceforge.net/projects/odfsvn/

Case 2: Multiple versions of the same document.

There is another way to have versions, that is to save entire documents in the change tracking folder and merging change history as efficient as possible.

It uses less space than in Case 1, is much easier to implement and is in the scope for ODF.
Means that this can be described completely. This makes different applications to read and use each other's versions without problems as described in Case 1 caused by the complexity or incompatibility.
Every detail of the implementation can be covered in ODF without adding too much complexity.

The change tracking folder in the main document is used to store the version of the whole document there without the change tracking folder.
Relevant information will be added to the ctmdf.

As an extra measure against errors when processing files and their internal data.
Meta data must be added on the version documents, files that declares it as a version of a document, file.
The ctmdf MUST hold that information and all other relevant information such as relations.

This is consistent with the change tracking folder and ctmdf idea and use
(centralized, efficient, consistent handling of changes in a file).

However there can be situations where the user wants to keep everything.
The default behaviour of applications that encounter such folders must be ignoring them when not explicitly needed.

This approach allows applications to save a version file that is a standard ODF file on it's own!
Applications can use each other's files without any problems.
Applications can flag changes, additions, deletions between two versions by directly comparing the main ODF file with the chosen internal version files.
The author meta data makes change annotation (per user) possible.
(This makes re visioning possible.)

Here is a case of multiple authors and (concurrent and non concurrent) versions described. (This should cover most, almost all possible, cases.)

What if a few writers make a different version file starting from one base file and need to merge those changes together?

The application could check if the version history in the merged files is the same or not.
This to merge the history of all the different files more efficiently when possible.

It constructs a main file, adds all concurrent and non-concurrent unique versions once and adds the relations between them in the ctmdf.
The application then constructs a main file with all the changes added to it.
The user could then accept or decline changes to construct a next unified version.

Merging itself is completely dependent on the application and unnecessary to describe in the specification.
How changes and inconsistencies are handled when merging the files to one new non-concurrent version file is up to the application.

How the merged file is constructed must be described in the ctmdf.
(How merging is described needs to be described in ODF under change tracking)
The merging must be done consistently and transparent to the user to be useful of course.

For developers, one possible way of merging consistently could go like this:
e.g. Use one of the files, or earlier, most recent as possible, non-concurrent version when possible, as a base for the main file.
Make a base file with that content.
Copy the contents of the base file to a new file: the main file.

Compare the changes from the first concurrent version with the base file and add the changes with annotations to the main file. The annotations are temporary, they are for avoiding wrong behaviour when adding changes from the next concurrent version.
Because they are temporary, they don't need to be described in ODF.

Compare the changes from the second concurrent version with the base file.
add the changes consistently in the main file, relative to the main file without previous changes added. Doing this using the annotations.
The application can know what to ignore when placing the new changes somewhere.
It avoids that changes in the main file will screw up because of the added changes of the preceding non-concurrent file. (e.g. If a developer uses the number of characters as a reference for placing changes, the application could end up adding stuff in the wrong place.)
If two changes are located in the same place relative to the base file.
The new changes can be placed before or after the changes from the previous files.

This process is repeated until all concurrent versions are added.
Then the main file is complete with the temporary annotations.
This is a file in the working memory that is not saved as a version.

The user approves and denies changes.
Then the user wants to save it.
The application must save a new version with the changes from the user, as the content because it's the most recent.
The most recent file will be the content document.
When reopening, the application merges the concurrent versions for the annotations again.
And compares that main file with the content. This will reveal what changes the user has accepted or declined.

The only downside is that when opening the document, all changes for re visioning must be recalculated. Recalculations and space shouldn't be such of a problem. Compression will most likely benefit from the repetitive nature of the versions. ) Recalculations of this kind can be done pretty efficiently and fast with a good algorithm.

Even unrelated documents could be added as concurrent versions.

They could be added as concurrent versions in the ctmdf.
The content on those files could be placed on the beginning or end by default.
However, a much better approach is asking the user where he or she wants to place the content. This approach should be encouraged to do.
The ctmdf must, after the decision, add the relations of this unrelated document.
And the change tracking folder must add the unrelated document tagged as a version of the file.

Not allowing to save the annotations themselves but able to generate them will get the heat of ODF and allow for more robust files.
Presenting, appearance and rendering of changes is already completely up to the application anyway.

This approach allows applications to use their own approach while maintaining standards and compatibility.
How merging is done as described in can be part of ODF.
Some example is described more in detail in: "For developers, one possible way of merging consistently could go like this".

References:
- Requirements
  - From: ronnie thebonnie <ronniethebonnie@gmail.com>