[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: c/o office-collab: Regarding ODF Collaboration.
Dear Advanced Document Collaboration SC, I spent some time adding ODF support to my NativeOOXML rendering engine. I finally stumbled upon the ODF Collaboration SC. Since one of the design goals of my implementation is real time collaboration a la Jupiter (resp. its web-based version Wave) or the newer versions of Google Docs I had a closer look. The magic behind a state of the art collaboration is a technique called “operational transformation”. I really recommend reading (http://www.waveprotocol.org/whitepapers/operational-transform). Bottom line: You have to have a document model which is only changed by clearly defined operations and the operations need to be designed in a way that you can apply them in a different order by “transforming them”. A very simple example are the following operations: Insert(pos=0, “Hello “); Insert(6, “World”) which lead to the document “Hello World”. You can change the order of the operations by transforming the insert positions. E.g. the sequence of operations Insert(pos=0, “World”) Insert(pos=0, “Hello “) would lead to the same document “Hello World”. You can show that you can build a quite robust (online) collaboration system based on the operational transformation. Anyway. What I found was odd. When I understood correctly the ODF Collaboration SC is going into the direction of applying an XML-diff algorithm to the ODF/XML-serialization to improve ODF Collaboration. Why do I find this odd? Well first of all: XML-diff???? Really? Isn't that the wrong layer? I can not see how a creator of an --- lets say --- ODF textdocument would be interested in the fact that some ODF/XML tags changed? I'd rather think that a user would be interested in the actual user-imposed changes to the document --- or more precise --- the operations applied to the documents by other users. Second: I find it rather ironic to use an XML-Diff algorithm for ODF collaboration. Especially since the XML-Diff algorithm was invented because the “plain text” longest-common-subsequence diff algorithms where too generic for XML. (Remember: XML is also text; resp. has a text representation). So if we need an XML-Diff because XML is more special than plain text, why doesn't the same rule apply to ODF? Don't we need a “special” algorithm for ODF? <irony>Since ODF is XML and therefore ODF is also text, why don't we apply good old “diff” to it?</irony> All I'm trying to point out is: Beware --- if you are a hammer everything looks like a nail. Third: How do we do cool online collaboration with an XML-Diff based change tracking? 4th: Where is ODF more special than XML (wrt. to change tracking)? The normal use case for XML docs is that you get two XML docs (lets say from a web-service or a database) and you want to see the difference in the trees they represent. (Please note that you are not interested in the changes of the textual representation but rather in the change of the tree ---- very much like you are not interested in the change of the ODF/XML but in the change of text document). In ODF you usually have an editor which applies operations to the documents. These operations directly represent not only the actual results of the users changes but also the history of the changes. So one very remarkable difference is that ODF documents are changed by editors which are able to track the operations applied by users. This allows very fine grained and powerful collaboration. An alternative approach to get cool OT-ready collaboration to ODF: Very simple: (a) Clarify the existing change tracking. Make sure people understand that <p>Hello <changed-start/>World<changed-end/></p> represents an operation Insert(“World” at the position 6 of the paragraph). With that information applications can implement decent OT-based collaboration. (b) Simply add markup for missing operations like: Insert-Row, Delete-Row, Insert-Cell, Delete-Cell, Move-Text, etc. The only challenge here is to find a comprehensive list of operations. In case you “remain surprised that neither Apple nor Google are taking ODF support seriously” (http://webmink.com/2011/01/18/apple-and-google-and-odf/) maybe --- just maybe --- some support of a state-of-the-art technology can change this. Best regards, Florian P.S. I needed to elaborate about operational transformation (OT) a bit to make my point clear. However OT is not needed at the ODF layer. Its the responsibility of the application and applications don't need to implement it if they don't want real-time online collaboration. What is needed in ODF is the recording of the operations applied by the user and not the recording of the ODF/XML changes. P.P.S. I think the Delta-XML-diff is a very cool algorithm. I just don't think its the right layer of change tracking for ODF documents.