IRC log - ODF Collab SC call

Call Summary:

If we want to create a simplified ODF XML for 2.0 or divide each document type into logical blocks, which are added, deleted and modified across all ODF applications, we need to understand and analyse better the existing ODF XML RelaxNG schema.

In this talk, we focused on tooling and tooling to assist us in this task.

On the 12th of July will be our next meeting:

https://www.timeanddate.com/worldclock/meetingdetails.html?year=2017&month=07&day=12&hour=14&min=30&sec=0&p1=179&p2=37&p3=136&p4=234&iv=1800

The teleconference login data for next call will be found in the OASIS calendar event:

https://www.oasis-open.org/committees/event.php?event_id=39525

[16:30] Svante Schubert: Dialing in...

[16:32] Svante Schubert: When I mentioned I would desire some better analysis of ODF Schema, Patrick dropt a link to XML Schema visualisation.

[16:32] Svante Schubert: https://stackoverflow.com/questions/2486758/how-to-visualize-an-xml-schema

[16:33] Svante Schubert: I have written myself in the ODF Toolkit a task on this topic: https://issues.apache.org/jira/browse/ODFTOOLKIT-458

[16:34] Svante Schubert: with some inital steps

[16:35] Patrick: I was looking for a non-diagram visualization of the ODF schema. Liquid Technologies, for example, does the traditional root element to your left with expanding content models to your right

[16:35] Patrick: which means that if <text:h> appears in different parent elements, you see the <text:h> element in each of its parent elements with its content model

[16:36] Patrick: What I want is a representation of the <text:h> element as a node in a graph that occurs only once and has parent edges to all the elements where it may occur. Such that I can visualize the schema as a directed graph

[16:37] Patrick: For instance, I can see all the elements who share the same parent elements visually by their parent edges

[16:39] Svante Schubert: Regarding <text:h> you may have a visualization that is focused on the <text:h>, right?

[16:42] Svante Schubert: Patrick mentioned the query, does text:p and text:h have the same parents, positions, etc..

[16:43] Patrick: For example, <text:p> and <text:h> have different parent elements (some the same, but not all)

[16:43] Svante Schubert: text:p and text:h do not have the same parent elements as text:p is not ALWAYS representing a paragraph. In ODF XML text:p is the fallback text container..

[16:45] Svante Schubert: I would have to look it up, but if there is a title of an image, table, frame, annotation, etc. all text is wrapped up in a text:p

[16:45] Svante Schubert: ^^look it up to give you a precise example in the schema, I meant

[16:46] Svante Schubert: When I wrote an ODF to XHTML transformation this was one of the suprising errors, as in XHTML paragraphs are not able to be nested, but text:p might be nested in ODF.

[16:46] Svante Schubert: On the other hand, in ODF this nested text:p are not necessarily paragraphs in the user's sense ;)

[16:48] Svante Schubert: The reason, one of the inventors of ODF once had the idea to wrap up all text, which is visible in the document to the user is within in text:p element.

[16:48] Patrick: Do you mean <text:p> contains some other element, which itself contains a <text:p>?

[16:49] Svante Schubert: yes, exactly!

[16:49] Patrick: Oh, you mean the default text that can be printed to the user in a flat XML document. OK I remember that!

[16:53] Svante Schubert: Let me give an example of nested text:p when using an annotation:

[16:53] Svante Schubert: <text:p text:style-name="Standard">Hello 
<office:annotation>
<text:p text:style-name="P1">
<text:span text:style-name="T1">My favorite W!</text:span>
</text:p>
</office:annotation>W<office:annotation-end/>orld</text:p>
<text:p>

[16:54] Patrick: representing ODF schema as a graph is doable, but requires decisions what to do with attributes, elements, etc., but doable with Neo4j/cypher or Titan/Gremlin

[16:55] Svante Schubert: The interesting thing, I realized I can overtake the internal graph of the MultiSchemaValdiator (our RelaxNG parser) as a graph!

[16:55] Svante Schubert: http://svn.apache.org/viewvc/incubator/odf/trunk/generator/schema2template/src/test/resources/examples/odf/odf12-msvtree.ref?revision=1167972&view=co

[16:55] Svante Schubert: ^^above you may find the dumped model and you see the number ahead is the graph level

[16:55] Svante Schubert: Like a stack :)

[16:56] Svante Schubert: It is easier to understand, when you read the MSV model and the ODF RelaxNG side by side

[16:56] Svante Schubert: http://docs.oasis-open.org/office/v1.2/os/OpenDocument-v1.2-os-schema.rng

[16:57] Svante Schubert: The MSV model is a depth-first graph

[16:59] Svante Schubert: Patrick, mentioned there is no grammar element nor namespaces in the MSV model contained, I should test if multiple namespaces for the same prefix is a potential problem

[17:00] Svante Schubert: The MSV model is the parsed ODF RelaxNG schema

[17:01] Svante Schubert: Search for office-document-content

[17:01] Patrick: starts with <choice> then takes first element <office-document> then jumps to def. of <office-document> then

[17:02] Svante Schubert: exactly

[17:02] Svante Schubert: 1: REF 'office-document-content',

[17:02] Svante Schubert: is very very later in the document

[17:02] Svante Schubert: and similar to a stack it is the first time the preifx 1: is appearing

[17:02] Svante Schubert: level 1 of the graph

[17:03] Svante Schubert: So we have the graph

[17:04] Patrick: Yes, you have the graph but not in a standard graph language. ;-)

[17:04] Svante Schubert: We might alter it in favor to use all featurs of the Graph Database

[17:04] Svante Schubert: We might to transform it to graph creation procedures.. ;)

[17:04] Svante Schubert: BUT before I do that, I will need to check how the analysis is being done in theory

[17:04] Svante Schubert: Like I write me a quite sophisticated (recursive) graph traversal

[17:05] Svante Schubert: That takes CHOICE into account..

[17:05] Svante Schubert: ^^for instance

[17:06] Svante Schubert: There are of course attributes in the MSV model tree, for instance

[17:06] Svante Schubert: 6: ATTRIBUTE "office:version",

[17:06] Svante Schubert: MSV (Multi Schema Validator) uses it for validation purposes and needs all the information set of the ODF XML schema

[17:06] Svante Schubert: Yes, a CHOICE is a node

[17:07] Svante Schubert: I guess ;)

[17:10] Svante Schubert: On the other hand, current nodes such as 
9: ONEOREMORE
are usually on a edge in graphs

[17:11] Svante Schubert: It is an intersting idea of Patrick to analyse existing graph visualisation techniques to see how to improve the graph design

[17:12] Patrick: do we need a choice node? in diagram representation it is a visualization technique - but do need to experiment

[17:15] Patrick: show all <text:h> with their parents, in my view would have only 1 <text:h> displayed along with its possible parents.

[17:16] Svante Schubert: The text: question is a question that desires a subset of the schema tree as output.

[17:16] Svante Schubert: The follow up question is, please create me a test document for text:h.

[17:18] Svante Schubert: I need the graph database to find pattern and compare patterns. For instance, what is the difference between text:p and text:h - aside of the local XML name! ;)

[17:25] Svante Schubert: If you are interesting in this field, install yourself the main stream graph database Neo4J: https://neo4j.com/

[17:25] Svante Schubert: They have a wonderful tutorial in the browser

[17:25] Svante Schubert: Question is how do we write a analysis like tell me if a text:p is nested ;)

[17:27] Svante Schubert: Maybe that is a little too much, but I was trying to write the parser of the MSV memory dump (link above) in ANTLR 4 - http://www.antlr.org/

[17:27] Svante Schubert: I am currently reading through the PDF book the owner wrote

[17:29] Svante Schubert: I only want to use it, because I want to know more about parser generator in general. For instance, parsing ODF XML and filtering the feature puzzles as events..

[17:30] Svante Schubert: But the next question is, how can Neo4J support the queries that we talked about earlier?

[17:31] Svante Schubert: How do we query the graph?

[17:31] Svante Schubert: Using the existing power of the graph database

[17:31] Svante Schubert: taking RelaxNG choices and references, etc. into account..

[17:33] Patrick: choice is just a node as child that offers options

[17:33] Svante Schubert: We need to add RelaxNG logic to the graph query!

[17:34] Patrick: sequence, is more difficult, will have to think about it. - depends on how to represent it - could be a graph attribute as distinguished from a document attribute

[17:35] Svante Schubert: Let's assume that all RelaxNG logic is for now only nodes, like we overtake the MSV graph completly without enhancing it

[17:36] Svante Schubert: The problem is to me how can we traverse this graph and taking RelaxNG logic into account to answer our questions on ODF

[17:38] Patrick: minimal document implies we note whether nodes are required or not

office-collab message