[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: IRC log - ODF Collab SC call - 2017-06-28
[16:30] Svante Schubert: Dialing in...
[16:32] Svante Schubert: When I mentioned I would desire some better analysis of ODF Schema, Patrick dropt a link to XML Schema visualisation.
[16:32] Svante Schubert: https://stackoverflow.com/questions/2486758/how-to-visualize-an-xml-schema
[16:33] Svante Schubert: I have written myself in the ODF Toolkit a task on this topic: https://issues.apache.org/jira/browse/ODFTOOLKIT-458
[16:34] Svante Schubert: with some inital steps
[16:35] Patrick: I was looking for a non-diagram visualization of the ODF schema. Liquid Technologies, for example, does the traditional root element to your left with expanding content models to your right
[16:35] Patrick: which means that if <text:h> appears in different parent elements, you see the <text:h> element in each of its parent elements with its content model
[16:36] Patrick: What I want is a representation of the <text:h> element as a node in a graph that occurs only once and has parent edges to all the elements where it may occur. Such that I can visualize the schema as a directed graph
[16:37] Patrick: For instance, I can see all the elements who share the same parent elements visually by their parent edges
[16:39] Svante Schubert: Regarding <text:h> you may have a visualization that is focused on the <text:h>, right?
[16:42] Svante Schubert: Patrick mentioned the query, does text:p and text:h have the same parents, positions, etc..
[16:43] Patrick: For example, <text:p> and <text:h> have different parent elements (some the same, but not all)
[16:43] Svante Schubert: text:p and text:h do not have the same parent elements as text:p is not ALWAYS representing a paragraph. In ODF XML text:p is the fallback text container..
[16:45] Svante Schubert: I would have to look it up, but if there is a title of an image, table, frame, annotation, etc. all text is wrapped up in a text:p
[16:45] Svante Schubert: ^^look it up to give you a precise example in the schema, I meant
[16:46] Svante Schubert: When I wrote an ODF to XHTML transformation this was one of the suprising errors, as in XHTML paragraphs are not able to be nested, but text:p might be nested in ODF.
[16:46] Svante Schubert: On the other hand, in ODF this nested text:p are not necessarily paragraphs in the user's sense ;)
[16:48] Svante Schubert: The reason, one of the inventors of ODF once had the idea to wrap up all text, which is visible in the document to the user is within in text:p element.
[16:48] Patrick: Do you mean <text:p> contains some other element, which itself contains a <text:p>?
[16:49] Svante Schubert: yes, exactly!
[16:49] Patrick: Oh, you mean the default text that can be printed to the user in a flat XML document. OK I remember that!
[16:53] Svante Schubert: Let me give an example of nested text:p when using an annotation:
[16:53] Svante Schubert: <text:p text:style-name="Standard">Hello <office:annotation> <text:p text:style-name="P1"> <text:span text:style-name="T1">My favorite W!</text:span> </text:p> </office:annotation>W<office:annotation-end/>orld</text:p> <text:p>
[16:54] Patrick: representing ODF schema as a graph is doable, but requires decisions what to do with attributes, elements, etc., but doable with Neo4j/cypher or Titan/Gremlin
[16:55] Svante Schubert: The interesting thing, I realized I can overtake the internal graph of the MultiSchemaValdiator (our RelaxNG parser) as a graph!
[16:55] Svante Schubert: http://svn.apache.org/viewvc/incubator/odf/trunk/generator/schema2template/src/test/resources/examples/odf/odf12-msvtree.ref?revision=1167972&view=co
[16:55] Svante Schubert: ^^above you may find the dumped model and you see the number ahead is the graph level
[16:55] Svante Schubert: Like a stack :)
[16:56] Svante Schubert: It is easier to understand, when you read the MSV model and the ODF RelaxNG side by side
[16:56] Svante Schubert: http://docs.oasis-open.org/office/v1.2/os/OpenDocument-v1.2-os-schema.rng
[16:57] Svante Schubert: The MSV model is a depth-first graph
[16:59] Svante Schubert: Patrick, mentioned there is no grammar element nor namespaces in the MSV model contained, I should test if multiple namespaces for the same prefix is a potential problem
[17:00] Svante Schubert: The MSV model is the parsed ODF RelaxNG schema
[17:01] Svante Schubert: Search for office-document-content
[17:01] Patrick: starts with <choice> then takes first element <office-document> then jumps to def. of <office-document> then
[17:02] Svante Schubert: exactly
[17:02] Svante Schubert: 1: REF 'office-document-content',
[17:02] Svante Schubert: is very very later in the document
[17:02] Svante Schubert: and similar to a stack it is the first time the preifx 1: is appearing
[17:02] Svante Schubert: level 1 of the graph
[17:03] Svante Schubert: So we have the graph
[17:04] Patrick: Yes, you have the graph but not in a standard graph language. ;-)
[17:04] Svante Schubert: We might alter it in favor to use all featurs of the Graph Database
[17:04] Svante Schubert: We might to transform it to graph creation procedures.. ;)
[17:04] Svante Schubert: BUT before I do that, I will need to check how the analysis is being done in theory
[17:04] Svante Schubert: Like I write me a quite sophisticated (recursive) graph traversal
[17:05] Svante Schubert: That takes CHOICE into account..
[17:05] Svante Schubert: ^^for instance
[17:06] Svante Schubert: There are of course attributes in the MSV model tree, for instance
[17:06] Svante Schubert: 6: ATTRIBUTE "office:version",
[17:06] Svante Schubert: MSV (Multi Schema Validator) uses it for validation purposes and needs all the information set of the ODF XML schema
[17:06] Svante Schubert: Yes, a CHOICE is a node
[17:07] Svante Schubert: I guess ;)
[17:10] Svante Schubert: On the other hand, current nodes such as 9: ONEOREMORE are usually on a edge in graphs
[17:11] Svante Schubert: It is an intersting idea of Patrick to analyse existing graph visualisation techniques to see how to improve the graph design
[17:12] Patrick: do we need a choice node? in diagram representation it is a visualization technique - but do need to experiment
[17:15] Patrick: show all <text:h> with their parents, in my view would have only 1 <text:h> displayed along with its possible parents.
[17:16] Svante Schubert: The text: question is a question that desires a subset of the schema tree as output.
[17:16] Svante Schubert: The follow up question is, please create me a test document for text:h.
[17:18] Svante Schubert: I need the graph database to find pattern and compare patterns. For instance, what is the difference between text:p and text:h - aside of the local XML name! ;)
[17:25] Svante Schubert: If you are interesting in this field, install yourself the main stream graph database Neo4J: https://neo4j.com/
[17:25] Svante Schubert: They have a wonderful tutorial in the browser
[17:25] Svante Schubert: Question is how do we write a analysis like tell me if a text:p is nested ;)
[17:27] Svante Schubert: Maybe that is a little too much, but I was trying to write the parser of the MSV memory dump (link above) in ANTLR 4 - http://www.antlr.org/
[17:27] Svante Schubert: I am currently reading through the PDF book the owner wrote
[17:29] Svante Schubert: I only want to use it, because I want to know more about parser generator in general. For instance, parsing ODF XML and filtering the feature puzzles as events..
[17:30] Svante Schubert: But the next question is, how can Neo4J support the queries that we talked about earlier?
[17:31] Svante Schubert: How do we query the graph?
[17:31] Svante Schubert: Using the existing power of the graph database
[17:31] Svante Schubert: taking RelaxNG choices and references, etc. into account..
[17:33] Patrick: choice is just a node as child that offers options
[17:33] Svante Schubert: We need to add RelaxNG logic to the graph query!
[17:34] Patrick: sequence, is more difficult, will have to think about it. - depends on how to represent it - could be a graph attribute as distinguished from a document attribute
[17:35] Svante Schubert: Let's assume that all RelaxNG logic is for now only nodes, like we overtake the MSV graph completly without enhancing it
[17:36] Svante Schubert: The problem is to me how can we traverse this graph and taking RelaxNG logic into account to answer our questions on ODF
[17:38] Patrick: minimal document implies we note whether nodes are required or not
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]