RE: [lexidma] Re: DMLex spec

Hi John, hi all,

A few comments are in-line below.

Simon

From: lexidma@lists.oasis-open.org [mailto:lexidma@lists.oasis-open.org] On Behalf Of John McCrae
Sent: Monday, May 23, 2022 5:28 PM
To: lexidma@lists.oasis-open.org
Subject: Re: [lexidma] Re: DMLex spec

Hi Michal,

I was working on the RDF serialization. Here are some comments, queries that came up.

ID and URI are the same thing in RDF serialization. Would this be problematic

I am not sure why `transcriptionScheme` is typed as a `langCode`

[Simon] Indeed.

I assume that `homographNumber` > 0 (in value not cardinality)
The 0..1 restriction on definition seems a bit limiting. I know in Open English WordNet we have multiple definitions for the same sense.

[Simon] Yes, this was also Carole's comment. More than one should be allowed.

`label` is used on multiple classes. I thought we weren't allowing this? Shouldn't it be `senseLabel`, `inflectedLabel`, etc.?

[Simon] From Âlexicographic point of viewÂ it makes more sense to have a generic Âlabel (type)Â that you can attach to different ÂobjectsÂ or parts of an entry.

`source` is also used on multiple classes (LexicographicResource and Example). In this case, its meaning is quite different. In fact, it is not specified what the value of the `source` of a `lexicographicResource` should be.

[Simon] I'm not so sure about this one.

For `sameAs` I assume we will just use OWL's built-in property.
`translationLanguage` is marked as 1..1. This would suggest that every lexicographic resource has a translation language even if it is monolingual. Similarly `translationLanguage` has 1..n. I guess that you mean that if users use this module this property is required, but we have no mechanism to say what modules are being used so I think this will lead to issues with the serializations.

[Simon] I thought that ÂlanguageÂ is used in monolinguals, and ÂtranlationLanguageÂ in bi- and multilinguals. Which probably means â if I understand John correctly â that we should ÂforceÂ users to choose between three possible types of lexicographic resources in the first place: monolingual, bilingual, multilingual.

Sec 3.3, extensions to example shows `sense` still.
Should we use `partOfSpeech` on `headwordTranslation`. Will the POS values for the translations be the same as entries or should we have `translationPartOfSpeech` for the foreign language POS tag set?

[Simon] I'm not sure about this. What happens on the translation side can be quite wild: one word to two or many, or zero, different POS, quasi explanations, etc.

Sections 3 and 4 are very confusing and I thought at first it was just a mistake with copy/paste. I guess you want to say that `language` is an additional required property in the multilingual module, but why not just say that the Multilingual module is an extension to the Bilingual module. I am not sure how we intend users to declare which modules they are using anyway.

[Simon] I see some justification for bi- and multilingual modules. In general, (traditional) lexicography is rather averse to multilingual dictionaries, as it's difficult enough to present a consistent contrastive analysis of two languages, and with multilinguals it's either annoying simplification or exponential growth of CA problems.

The inline module is quite difficult to model in RDF. There is a bit of a clash as I want `headword` to be a string value not an object and I don't have the ability for it to be both like in your serialization. The same issue must exist in the JSON serialization as well. I suggest that, instead of adding children to `headword`, we introduce a property `headwordPlaceholderMarker`

So taking your example into RDF we would have something like this.

@prefix dmlex: <http://www.oasis-open.org/to-be-confirmed/dmlex> .

<abandon-verb> dmlex:headword "abandon" ;

dmlex:partOfSpeech "verb" ;

dmlex:sense <abandon-verb-1>, <abandon-verb-2> .

<abandon-verb-1> dmlex:definition "to suddenly leave a place or a person" ;

dmlex:example [ dmlex:value "I'm sorry I abandoned you like that" ] ,

[ dmlex:value "Abandon ship" ; dmlex:label "idiom" ] .

<abandon-verb-2> dmlex:label "mostly-passive" ;

dmlex:definition "to stop supporting an idea" ;

dmlex:example [ dmlex:value "The theory has been abandoned" ] .

Some of these properties could be mapped to OntoLex properties, e.g., `dmlex:sense` => `ontolex:denotes`, but not so many.

I attach my first stab at making an OWL ontology for DMLex.

Regards,

John

On 18/05/2022 17:40, Michal MÄchura wrote:

Hi everyone,

So here my latest draft. What's new:

- I have made some tweaks to the pseudocode in which we present examples throughout the document. Consequently, I have made some changes to the formalism through which we define the data model. The distinction between "objects" and "properties" has disappeared, and we now explicitly state the arities and types of everything everywhere (TomaÅ will like that!). I think the whole document is now be easier to understand for readers, even outside our tribe.

- There have been some tiny, almost cosmetic, changes to some names of things.

In other words, this draft doesn't really bring any conceptually new material. Everything in it is based on things we've agreed and consensuses we've built up, so I wouldn't expect any disagreement or surprised reactions at this point. That said, I strongly encourage everyone to read it from start to end now because this is our last chance to catch any problems before we present it publicly at the ELEXIS event in Florence.

With this now, we have created a clean, logical, consistent, simple, IT-friendly, well though-through data model for dictionaries. No-one's every done this before in the history of lexicography. It's taken us a lot of talking and proposing and writing and rewriting to distil our ideas into a single, universally usable data model, but we've done it and the result is really likeable!

(Oh yes, the document is not in DocBook. I'll try to beat it into the DocBook format and submit it as a proper pull request through GitHub by Monday.)

M.

Ãt 17. 5. 2022 v 22:07 odesÃlatel Michal MÄchura <462258@mail.muni.cz> napsal:
OK, so I'm going to need one more day. Sorry for keeping you waiting but the result will be worth it!
M.

po 16. 5. 2022 v 18:24 odesÃlatel Michal MÄchura <462258@mail.muni.cz> napsal:
Hi everyone,

I am busy editing the DMLex spec but I think I l need one more day to get it done properly. So, you can expect an e-mail from me towards the end of tomorrow (Tuesday 17 May).

M.
 
---------------------------------------------------------------------
To unsubscribe from this mail list, you must leave the OASIS TC that 
generates this mail.Â Follow this link to all your TCs in OASIS at:
https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php 

lexidma message