Different types of paragraphs

Dear SC,

when it comes to the definition of changes, we are defining how a typical user change will alter the ODF XML according to the RelaxNG schema in a certain pattern.

One of the basic and simple changes is the insertion and deletion of a paragraph represented by the <text:p> element.

A quick look into the chapter on paragraphs in the ODF 1.2 specifcation reveals that it might have 45 different parent elements.

The important question: Does every insertion/deletion of an element named <text:p> in any allowed place of the document will cover the same user feature of adding a paragraph?

Let me rephrase the problem:

New ODF applications will start from the scratch and will add feature by feature to their application according to the feature's importance. Will all paragraphs be able to be added at the same time?

The answer is: No they are not. As we all know, some paragraphs are "just" labels of a chart or graphical object, part of an annotation "flowing" aside of the text or part of the header or footer.

Only a subset are the likely first feature: As top level element of the document or child of a top level element.

The rule of thumb is that no nested paragraphs is within the text flow.

But how do we find this out in general and separate the elements according to their semantic use?

Of great assistance is our RelaxNG schema.

In our ODF 1.2 RelaxNG schema the definition of the paragraph

</zeroOrMore>

</element>

</define>

This name pattern is referenced 19 times.

One is referenced by the pattern "text-content", which itself is used 7 times.

This pattern represents the paragraphs in the text flow of the document.

But are there more? To be certain every paragraph have to be followed and tested.

I am certain now that the semantic analysis of the schema should not be done by hand (over and over again), instead in a reproducible semi-automatic way using a graph database with queries.

In this case, we might want to stand upon shoulders of giants and follow the approach used for source code analysis, see the youtube video "mining for bugs with graph database queries".

They once used Neo4J with the scripting language Gremlin, but switched recently to TitanDB.

Only the adding of changes support to the ODF Toolkit project as demo case has more importance to me at the moment, so if anyone like to start some testing, please get in contact with me.

Any comments?

Have a nice day,

Svante

PS: Some questions from my side:

How did we once created the 45 element list in the ODF 1.2 specification?

Hasn't been there some tooling? If so where can we find it?

office-collab message