office-collab message

Subject: Re: [office-collab] Defining the Basics: The Search for Components
From: "C. Boemann" <cbo@boemann.dk>
To: office-collab@lists.oasis-open.org
Date: Fri, 29 Mar 2013 10:12:05 +0100
No that is indeed a weak point. I guess we can store a checksum and if that 
doesn't match then we should consider the changetracking lost.

The problem is that apart from the last operation the numbers may refer to 
components no longer present, or offset because components have been inserted 
or removed (by operations) before the place (without updating the operations, 
because the app is unaware).

An alternative is to store some kind of marker tags in the content and then 
refer relative to those. Not all markers will be present in the content.xml at 
load time. When a changeoperation adds/removes a component or some text it has 
to add/remove a marker and the id of that marker should be specified as part of 
the operation that adds the component, so that other operations can now refer 
to the newly added component via the marker.

The drawback of this scheme is that it once again litters our content with 
markup, but at least only markers and no actual data. And changes to changes 
will not have to be marked.

This is where changetracking and realtime collaboration differs. In realtime 
collab the apps have agreed to follow the protocol. For CT we need to be a bit 
more robust. That is why as member of the select committee, my support was not 
to a collab protocol, but to something collab'ish adapted for CT. As close as 
possible to gain from synergies, but the stack itself I always imagined as 
something destinctly CT. For collab it is still more efficient to communicate 
via numbers.

That said, even this marker approach can be foiled by a malicious/unknowing 
application. If it removes a component that was marked, then it implicitly 
removes all the changes to that component. However, in contrast to the number 
scheme changes to other components will survieve.

Internally I would still recommend that CT aware apps use numbers (or similar) 
rather than depending on id of markers. So on load/save numbers would be 
translated from/to marker tags. That is also why i say that for collab numbers 
should be used, because then apps don't have to deal with and remember markers 
outside of load/save. The ids of the marker tags can be regenerated if the 
corresponding CT stack is updated at the same time

CT unaware apps are required to keep the CT stack, and the marker tags without 
changing any id, but other than that they can edit the content.

So as I see it we can:
 1) keep the numbers and store a checksum (and be brittle)
 2) switch to some kind of marker tag. 

best regards
Camilla

On Friday 29 March 2013 01:21:10 John Haug wrote:
> Re: Referencing Components
> I would like to raise again a concern I have about the numbering approach. 
> (See the “Counting issues, etc” mail thread from early October 2012.)  It
> seems brittle in the case of a document with tracked changes that is
> edited in an application that does not support CT.  A non-CT app could add
> or remove components that change the indexing of the component that is a
> tracked change.  If the app saving the doc doesn’t review all the CT
> indexes and update them as needed, the indexes will point to the wrong
> components.  Or am I missing something that mitigates that?
> 
> From: office-collab@lists.oasis-open.org
> [mailto:office-collab@lists.oasis-open.org] On Behalf Of Svante Schubert
> Sent: Wednesday, March 27, 2013 6:55 AM
> To: office-collab@lists.oasis-open.org
> Subject: Re: [office-collab] Defining the Basics: The Search for Components
> 
> Hi Oliver,
> 
> I agree, we might want to add an "in general", because (see below)..
> 
> On 27.03.2013 14:22, Oliver-Rainer Wittmann wrote:
> Hi,
> 
> discussing "Component Search Criteria":
> In general I agree.
> The criteria "change to a component does not change other XML" is a little
> bit tricky. Some examples why I think it is tricky:
> (A) A deletion of a paragraph element (<text:p> or <text:h> element) will
> have influence on the content of a <text:paragraph-count> element
> elsewhere in the document. (B) A deletion of a paragraph element of type
> heading (<text:h element) might have influence on the content of a
> <text:table-of-content> element elsewhere in the document. XML does not
> only exist for being part of a component. There are two other reasons for
> XML, as being an "(aggregated) view" on the status of other components.
> This happens for a content table or the paragraph count element. (C) A
> change of the content of a paragraph which is part of a list might have
> influence on the content of <text:bookmark-ref> element which is
> cross-referencing this paragraph. The other reason aside of being a
> "aggregated view" is to group components loosely together, like for style
> formatting (i.e. >text:span> or the mentioned bookmark). These markers are
> no components itself. Someone might argue, that by removing a paragraph
> the status of the document is being changed as well, which is true,
> similar to the deletion of all its children.
> 
> 
> (D) A change to table cell content in a spreadsheet document might have
> influece on the content of other table cell's content which reference this
> table cell. The connection of cells via formula is indeed an explicit
> exception. Still the modularity of a cell - being a component - was
> explicitly broken by referencing to it by an external formula, an explicit
> ODF mechanism.
> 
> 
> (E) ...
> I do not think that these examples will hinder us to define a <text:p>
> element as the root of a component. But I think we need to 'tune' the
> "Component Search Criteria" to reflect such 'on the component's state
> depending XML changes'. Thanks for your feed-back!
> Svante
> 
> 
> Mit freundlichen Grüßen / Best regards
> Oliver-Rainer Wittmann
> 
> --
> Advisory Software Engineer
> ---------------------------------------------------------------------------
> ---------------------------------------------------------------- IBM
> Deutschland
> Beim Strohhause 17
> 20097 Hamburg
> Phone: +49-40-6389-1415
> E-Mail: orwitt@de.ibm.com<mailto:orwitt@de.ibm.com>
> ---------------------------------------------------------------------------
> ---------------------------------------------------------------- IBM
> Deutschland Research & Development GmbH / Vorsitzende des Aufsichtsrats:
> Martina Koederitz Geschäftsführung: Dirk Wittkopp
> Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart,
> HRB 243294
> 
> 
> 
> From:        Svante Schubert
> <svante.schubert@gmail.com><mailto:svante.schubert@gmail.com> To:       
> "office-collab@lists.oasis-open.org"<mailto:office-collab@lists.oasis-open
> .org>
> <office-collab@lists.oasis-open.org><mailto:office-collab@lists.oasis-open
> .org>, Date:        27.03.2013 11:33
> Subject:        [office-collab] Defining the Basics: The Search for
> Components Sent by:       
> <office-collab@lists.oasis-open.org><mailto:office-collab@lists.oasis-open
> .org> ________________________________
> 
> 
> 
> The Search for Components
> Search via Document Schemas
> 
> Instead of searching in every document at hand for components, the schema
> might be searched for components instead, as all given possibilities are
> automatically covered. An automated reading of the schema - perhaps with a
> visualization in a front-end to analyze the XML - might be very helpful as
> nowadays formats turn out to be quite complex1. Optimal would be a web
> based application to be able to decentralize the work in sorting out the
> XML elements to components2.
> 
> Component Search Criteria
> 
> A component is similar to a puzzle piece of a document, some logical unit,
> which consists of one or more XML elements, which are usually connected,
> but do not have to be (depends on the decision of the XML file format
> designer). The only rule is that the component have to be disjoint to
> other components. This means if the data or the state of the component is
> being changed, no other component’s data have to be changed (aside
> implicitly the parent). In other words by changing the components existing
> XML (element, attribute or text) or XML that is related to it, no other
> component as the containing component will change it state. The containing
> component changes its state as if for instance an image is being delete
> from the document, the document is changing as well, but no other
> component as other images, tables at a different place will change.
> Therefore if a component is being deleted, all XML (joint or spread over
> the XML file(s)) have to be deleted as a whole. Components usually have a
> specific XML element they start with “component root element”, like in ODF
> <text:p> for a paragraph. If the component may consists of multiple XML
> elements there are as well “component leave elements”. For instance, in
> ODF an image consists of the <draw:frame>, which provides the visual view
> size and the <draw:image> element containing the loadable graphic, while
> in HTML there is only a single <img/> element.
> 
> Often there is a lot of boilerplate XML elements in a format, which are not
> being mapped to a format. For instance, the components of an ODF text
> document are starting among <office:document>/<office:body>/<office:text>
> 
> All child elements of <office:text> are root components of the text
> document.
> 
> Similar to solving a Sudoku riddle it is best to solve the easy parts first
> and name the obvious components first. Aside of those root components, the
> components that are usually added by users via their applications are good
> starting points for an empiric approach.
> 
> When a component was found the “component root elements” (and in case of
> multi-element components either the ending “component leave elements” or
> if they are not easy to determine to mark the elements within the
> component named as “component trunk elements”) are best marked directly in
> the XML Schema. For instance in XML RelaxNG Schema using annotations3.
> 
> Referencing Components
> 
> A component within the component tree is referenced by its position.
> Similar to an URL position and identification should be the same.
> Components of all types (table, paragraph or character) should be handled
> equally when referenced by their position to allow an easy generic access.
> The root of the document would be “/” in the serialized string
> representing the position. All their children are counted by document
> order and representing by their document child position as an integer. For
> instance, the first component being a paragraph would be accessed via
> “/1”. The third character within this child paragraph would be accessed
> via “/1/3”. If there is a table after the paragraph, the fifth paragraph
> within the 4th cell of the 3rd row, would be accessed via “/2/3/4/5”.
> 
> Every component position can be mapped to its XML position.
> 
> Programming Guidance:
> The creation of a specific component tree can be easily accomplished during
> the load of an XML document by implementing/overwriting the SAX
> ContentHandler
> interface<http://www.saxproject.org/apidoc/org/xml/sax/ContentHandler.html
> >. By overwriting the startElement, endElement and characters methods, all
> XML elements being component root elements, component delimiters and text
> can be gathered and mapped to operation calls (only sequential adding
> (e.g. no deletion, merge, split) during loading a document).
> 
> 1 The document formats are very complex. The ODF 1.2 part 1 for instance
> counts about 600 XML elements and about 1300 XML attributes, not to
> mention the different attribute values possible, e.g. to express styles.
> See
> http://docs.oasis-open.org/office/v1.2/os/OpenDocument-v1.2-os-part1.html
> 
> 2The document formats are very complex. The ODF 1.2 part 1 for instance
> counts about 600 XML elements and about 1300 XML attributes, not to
> mention the different attribute values possible, e.g. to express styles.
> See
> http://docs.oasis-open.org/office/v1.2/os/OpenDocument-v1.2-os-part1.html
> 
> 3 http://relaxng.org/tutorial-20011203.html#IDA1OZR
References:
- Defining the Basics: The Search for Components
  - From: Svante Schubert <svante.schubert@gmail.com>
- Re: [office-collab] Defining the Basics: The Search for Components
  - From: Svante Schubert <svante.schubert@gmail.com>
- RE: [office-collab] Defining the Basics: The Search for Components
  - From: John Haug <johnhaug@exchange.microsoft.com>