OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

lexidma-comment message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Review of DMLex 1.0 CSD02


Dear all,

In the following I would like to provide a review of DMLex 1.0 CSD02. My perspective is that of a digital humanities (DH) scholar: originally trained in the humanities but with a shadow career in web development, both of which turned out to be fruitful for my current position as a research software engineer at the Academy of Sciences and Literature Mainz, Germany.

I am immensely grateful to the authors of DMLex for putting together this coherent and relatively easy to implement standard that works across the current trifecta of DH data structures: XML, RDF/SPARQL, and relational databases. XML has the benefit of many existing creation and transformation pipelines, RDF and SPARQL allow for quick information retrieval in Linked Open Data scenarios, and the management and web publication of complex datasets is most easily accomplished using relational databases. DMLex is not just compatible with all three of these common requirements, but DMLex data can also be serialised as TEI-Lex0 and/or OntoLex Lemon if required.

I had originally stumbled upon an earlier draft of DMLex approximately 1 1/2 years ago when I was looking for best practices to model lexicographic data in a relational database when I was tasked with modernising the infrastructure of the Digital Dictionary of Surnames in Germany. This task ultimately ended in an in-progress implementation of DMLex for TYPO3, as part of a larger set of modules that can be used to build management-, editing-, and/or presentation-focused web apps for multimodal cultural-heritage data - like an image archive with object- and time-based data or a dictionary with geodata and records of historical persons.

Two design patterns of DMLex, i.e., its tag/label and relations systems, were modelled so convincingly that I turned them into elements of the core module of the software mentioned above.[1] The tag system structures the data and provides a simple metaphor for structured content based on authority files, which have become common in DH. The relations system, on the other hand, provides a way to model complex interrelations between classes. Since I have seen a fair bit of criticism of relations in DMLex on this list, I would like to use the opportunity to provide a very clear reason for why I think it should stay the way it is: it is modelled for query speed. It allows for the simple retrieval of networks between entries without the need for a triple store or graph database. While I do recognise that editors need to adapt their workflow to use such a system, this is a simple UI/UX challenge rather than a point where a data model should solely follow an established input tradition.

An easy challenge during the implementation was the direction in which parent/child relations are designed in the part of the spec dealing with relational databases. The spec decides in favour of children indicating their parents while the automatic forms generated by TYPO3 can be used more easily (and in a more user-friendly way) when parents identify their children. The spec already treats the respective section as a suggestion rather than normative content, but a simple note on parent/child relations being possible top-down or bottom-up could help others implement DMLex with less friction.

The other, more severe hurdle I encountered in the implementation were the names of several properties clashing with each other due to the fact that I implemented all tags as well as "RelationType" and "MemberType" in a single "Tag" table of TYPO3's database with multiple "type"s designating which type of tag was being provided. Crawling the spec repo and the mailing list I realise that an early draft of the spec contained and abandoned this logic before I first encountered it, but I would assume that other implementers may take a similar shortcut because it significantly reduces the number of database tables required to run DMLex. I provide a list of classes and properties that I needed to rename below - not to convince the LEXIDMA group to follow suit, but to document potential hassle around a spec feature where I found the names of classes and properties to be more confusing than elsewhere and possibly clashing in lazy implementations like mine:

- InflectedFormTag: renamed to "InflectionTypeTag" to align it with "DefinitionTypeTag" - RelationType and MemberType: renamed to "RelationTypeTag" and "MemberRoleTag", respectively - LabelTag, LabelTypeTag, PartOfSpeechTag, InflectionTypeTag, DefinitionTypeTag, TranscriptionSchemeTag: property "tag" renamed to "code" to avoid confusing the "tag" property with the "Tag" classes containing them - LabelTag: property "typeTag" renamed to "labelType" to align it with other type indicators - MemberRoleTag: property "role" renamed to "text" in alignment with other tags - MemberRoleTag: property "type" renamed to "memberType" because it clashed with the type indicator needed for the unified table of all tags - RelationTypeTag: "type" renamed to "text" because it clashed with the type indicator needed for the unified table of all tags - RelationTypeTag: "memberType" renamed to "memberRole" to avoid conflict with the new "memberType" property now used for the MemberRoleTag class

I hope this makes some sense. I similarly aligned another property in the class "InflectedForm" where I renamed "tag" to "labelType" similar to how "Definition" has a "definitionType". The two illustrations I attached depict all classes and properties I needed to rename in a fuchsia colour. They were all necessitated either by changing from bottom-up to top-down relations or by simplifying all tags and tag-like classes into a single database table.[2]

Last but not least, there is one aspect that I think should not go unnoticed as it illustrates the neat design of DMLex. The lexicographic resources we produce at the Academy of Sciences and Literature Mainz often contain historical examples, as in the case of field names or names of historical persons. To accommodate this, we simply added properties like "period", "locationRelation" and "agentRelation" to the existing "Examples" class and allowed for an "example" property in the "Entry" class in addition to the "Sense" class. Furthermore, we needed frequency data for multiple countries to both "Entry" and "Sense" and were able to just add a respective class and property as a sort-of custom module. This is to highlight that I have become very fond of the modular design of the spec because implementers like me may need this sort of flexibility.

I apologise for my slightly verbose review, but hope to see DMLex in its final 1.0 form to be widely adopted. Let me know if any of the points above need clarification.

Best regards and thank you for this important spec work,
Jonatan Steller


[1] The software is called Cultural Heritage Framework 2 (CHF). Its central module, CHF Base, contains the main relations and tag systems. The additional model for lexicographic data, CHF Lex, implements all other classes in the main DMLex spec and the Controlled Vocabulary and Linking modules. The software lives at https://github.com/digicademy-chf and the two entry-way illustrations of what classes and properties I implemented lives at https://digicademy-chf.github.io/chf_base/en-us/Base/DataModel/Index.html for the Base module and https://digicademy-chf.github.io/chf_lex/en-us/DataModel/Index.html for the Lex module.

[2] If the two images do not make it to the official list, they can also be found at the documentation links provided at the end of footnote 1.


--
Dr. Jonatan Jalle Steller
Wissenschaftlicher Mitarbeiter/Research Software Engineer
Digitale Akademie

Akademie der Wissenschaften und der Literatur Mainz
Geschwister-Scholl-StraÃe 2
55131 Mainz

Attachment: DataModel.png
Description: PNG image

Attachment: DataModel.png
Description: PNG image



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]