Re: [office-collab] The Art of Mapping

Hello Patrick,

you gave a very good example of the problem I tried earlier to describe:

How do we find out what is the semantic difference between the <text:p> on root document level (beyond <office:body> and a <text:p> within an <draw:image>.
How shall we handle it?

Let us consider how ODF application developer would handle such a case.

Imagine you and I started working on a brand-new ODF office product based on a browser and the exchange of changes.

Our architecture works as follows: We load the document into our environment on our serve and transform it to a sequence of changes, which are being sent to the browser. Every user change at the browser will be a similar change and sent back to the server. If the user likes to receive back the ODT document the new changes are being merged into the document.

Now comes the challenge: Our investor offers us a 1 million bonus if we are able to implement the above use case within a week "correctly" in our browser based office, which renders the pages using HTML5 and _javascript_. He defines "correct" by selling the product to 1 million customer, which were previous users of LibreOffice and MSO, and if the majority is protesting we did something wrong, otherwise we did it correct.

Note: This use case gives us the usual background of time pressure and the en-vogue 'customer-first view'.. ;)

As we are pros, have little time and a lot to win/loose we start to

1) Create a test document corpus (usually one document) ;)

2) Open it in MSO & LibreOffice to identify user expectation

3) Read the spec, if we did something wrong.. - we should have done this first, but we are as well pragmatists.. ;)

So, I have created a test document with LibreOffice adding a title and alternative text in the GUI (basically every option I could find to add text) to see if we can trigger the above use case.

It does not trigger the above case, the ODF XML looks like:

<text:p text:style-name="Standard">

<draw:frame draw:style-name="fr1" draw:name="Atlas Santiago" text:anchor-type="paragraph" svg:x="0.771cm" svg:y="0.319cm" svg:width="2.566cm" svg:height="4.034cm" draw:z-index="0">

<draw:image xlink:href="" xlink:type="simple" xlink:show="embed" xlink:actuate="onLoad"/>

<svg:title>Sculpture of Atlas, Praza do Toural, Santiago de Compostela.</svg:title>

</draw:frame>

</text:p>

(see ParagraphVariations1.odt attached)

I double checked the specification now on <draw:image>, <draw:frame> and <text:p>

http://docs.oasis-open.org/office/v1.2/os/OpenDocument-v1.2-os-part1.html#element-draw_image

http://docs.oasis-open.org/office/v1.2/os/OpenDocument-v1.2-os-part1.html#element-draw_frame

http://docs.oasis-open.org/office/v1.2/os/OpenDocument-v1.2-os-part1.html#element-text_p

There is no explanation on the semantics of draw:image/text:p.

I understand that, the "title" from the GUI becomes the @draw:name and the "alternative text" becomes <svg:title> a sibling of <draw:image>, which is according to the <draw:frame> pattern, showing the alternative text only, if the previous sibling <draw:image> can not (or should not) be shown by the application.

Based on the above, I have added the paragraph of the scenario manually to the XML and loaded the document into LibreOffice and MSO:

<text:p text:style-name="Standard">

<draw:frame draw:style-name="fr1" draw:name="Atlas Santiago" text:anchor-type="paragraph" svg:x="0.771cm" svg:y="0.319cm" svg:width="2.566cm" svg:height="4.034cm" draw:z-index="0">

<draw:image xlink:href="" xlink:type="simple" xlink:show="embed" xlink:actuate="onLoad">

<text:p text:style-name="Standard">Something within Image</text:p>

</draw:image>

</draw:frame>

</text:p>

(see ParagraphVariations2.odt attached)

But the new paragraph is not being rendered in both LibreOffice 5.2.1.2 and MSO Word Professional Plus 2016 both on Windows 10 64bit.

Usually I would stop here and only validate the test file to be certain I have a valid test and finished this task, but I tried two further things.

I opened the document with MSO 2016 and tried to add any text possibility available. Indeed there is the option of adding a caption, which I did not find instantly in LibreOffice. The XML looks like:

<text:p text:style-name="Standard">

<draw:frame draw:z-index="251660288" draw:id="id0" draw:style-name="a0" draw:name="Text Box 2" text:anchor-type="paragraph" svg:x="0.30347in" svg:y="1.775in" svg:width="1.00972in" svg:height="0.00069in" style:rel-width="scale" style:rel-height="scale">

<draw:text-box>

<text:p text:style-name="Caption">MyNewLabel-Caption<text:s/>A</text:p>

</draw:text-box>

<svg:title/>

<svg:desc/>

</draw:frame>

<text:span text:style-name="T2">

<draw:frame draw:z-index="251658240" draw:style-name="a1" draw:name="Atlas Santiago" text:anchor-type="paragraph" svg:x="0.30354in" svg:y="0.12559in" svg:width="1.01024in" svg:height="1.58819in" style:rel-width="scale" style:rel-height="scale">

<draw:image xlink:href="" xlink:type="simple" xlink:show="embed" xlink:actuate="onLoad"/>

<svg:desc/>

</draw:frame>

</text:span>

</text:p>

(see ParagraphVariations3.odt attached)

Still there is no paragraph element within the image element.

But it gives us some new challenges if the user only wants to add a "Caption" property to the image there is a LOT of XML changes going on.. :/

Even a little trap to a naive implementer as the caption comes before the image in the XML, but the svg:y shows that they are rendered the other way around.

If our investor would have wanted the caption feature as well, we would need to check the following:

if this feature breaks during a round-trip through LibreOffice
if this feature renders correct in LibreOffice
if the feature is available in LibreOffice and if the created XML is equal and if not starting at 1. with MSO (roundtrip & render testing)
if there are multiple ways to serialize the caption feature, choose the one that works best for the ODF applications of the customers and is hopefully valid :)

Finally, coming back to the nested paragraph within the image, I looked into the ODF 1.1 specification and there it is written as last sentence of the Image section:

"Like most other drawing shapes, image drawing shapes may have text content. It is displayed in addition to the image data."

http://docs.oasis-open.org/office/v1.1/errata01/os/OpenDocument-v1.1-errata01-os-complete.html#__RefHeading__87663_321613613

It seems the text SHOULD be rendered, but HOW?

Well, as the majority of customers would not expect a text to be rendered. I simply avoid any complication and drop this feature, knowing that the majority of customers will not recognize the missing feature or even will be disturbed when their documents render differently. In addition, I realized when saving the document containing the manual added paragraph within the image containing text. The paragraph is lost after saving it back with LO or MSO.

But if we would like to render the nested paragraph to HTML5 we have to be certain that we do not use the HTML5 paragraph tag as nested paragraphs are not allowed in HTML.

The HTML5 paragraph is flow content and contains only phrasing-content.

I guess I stop here, my point is there is a lot of work involved for an ODF expert to understand and map this feature from ODF changes and back from changes to their ODF application (here HTML5).

Our responsibility in the SC is to create the mapping from ODF XML to changes in respect to ease the ODF developers work to map the ODF change to their application model.

Conclusion

A good example, I was first not certain how to handle this case "correctly" for ODF developers and for us as ODF standardization body, but as the majority of ODF users will not see this feature as the market leaders to not implement this feature, this feature needs to be distinct to the 'common' paragraph feature, which they are implementing otherwise there will be a future problem when two applications with different feature set are collaborating. Remember Vi vs. LibreOffice, where features have to be removed and added back into the communication stream to allow seamless integration of changes.

Finally, I am still uncertain how to indicate the difference between the paragraph types in the representation of changes.

Looking forward to our call later today..

Regards,

Svante

2016-11-29 23:11 GMT+01:00 Patrick Durusau <patrick@durusau.net>:

Svante,

Curious, what difference do you see between:

<office:text><text:p>blah, blah</text:p></office:text> and

<draw:image><text:p>blah, blah</text:p></draw:image>?

Both of those <text:p> elements are containers of text. Yes?

As a writer of documents I may put more emphasis on <office:text><text:p> as "normal" paragraphs but that's a happenstance of my work flow.

Graphic artists are like to think <draw:image><text:p> is the more important case.
If all I want to transmit is office:text/text:p/(some-change) or draw:image/text:p/(some-change), I fail to see the increased complexity of the second one? One is not simpler than the other. Yes?

Not to mention it avoids prejudice in favor of office:text, which isn't likely to find favor with people who use charts, draw, table, etc.

Yes?

I will try to make the call tomorrow.

Hope you are having a great week!

Patrick

On 11/29/2016 12:01 PM, Svante Schubert wrote:

I guess we all agree meanwhile that an office document as the shock frozen final state of the user's work is equivalent to a sequence of user changes creating this document.

There is an advantage to have a different view on the document, to split this monolithic complex block into changes. Changes are the 'currency' of collaboration. They are exchanged and used for merges to synchronize the paralleled work. In general, the usage of changes allows to exchange and merge minimal portions of the document, reducing complexity and in addition even enabling new features such as multiple format changes on text as discussed on last TC call.

My question to you - which I have trouble with - is how do we slice in general the features we base the changes upon?

Let me give you an example. In ODF XML every text visible to a user is within a <text:p> paragraph element. This is basically a valid design decision, but to the user (esp. to me) the paragraph within the text flow (not even header / footer) is the most important paragraph or at least a logical class for itself. Other <text:p> rather belong to other logical unit instead being one for themselves. Sometimes a <text:p> is part of the title of a frame/picture or containing the text of an annotation (i.e. being just metadata on some document portion).

The problem might become more apparent when you think of an ODF application being started to be implemented from the green field. The first step is to show text and paragraphs only (paragraph as I define them). But in general, any application might decide would part of ODF it implements and should be able to bootstrap from arbitrary features they pick from the ODF XML to create their applications, e.g. tables, images, etc.

In my "text & paragraph-only"-example we might think of the VI or Emacs (or any other text) editor that reads ODF XML and shows every paragraph as a single line showing the text of the paragraph. Likely the only the user changes of creation of paragraph and text are being send to the editor.

By this it is possible that two ODF applications with total very different feature set do collaborate. For instance, a VI/Emacs might collaborate with MSOffice / LibreOffice in theory.

The only problem would be for the user of the VI/Emacs to decide to place a paragraph/text before or after an unknown logical object. But changing/add/delete of text/paragraph in a homogeneous part works fine.

My problem is to define a rule how to realize that the <text:p> have different meanings.

I fear there is no other solution than trial and error.

Of course, it would be a good decision to slice the ODF grammer (RelaxNG) into slices defining the logical blocks (using GraphDB), but it seems in the end we need to have ODF documents using these blocks to be loaded into ODF applications to check how applications are interpreting and rendering currently those variations of elements.

Any thoughts and/or comments?

Looking forward to exchange some ideas on this in our tomorrows call.

Regards,

Svante

ᐧ

-- Patrick Durusau patrick@durusau.net Technical Advisory Board, OASIS (TAB) Editor, OpenDocument Format TC (OASIS), Project Editor ISO/IEC 26300 Co-Editor, ISO/IEC 13250-1, 13250-5 (Topic Maps) Another Word For It (blog): http://tm.durusau.net Homepage: http://www.durusau.net Twitter: patrickDurusau

office-collab message