Re: [lexidma] Working draft

Ãt 12. 10. 2021 vÂ14:48 odesÃlatel John McCrae <john.mccrae@insight-centre.org> napsal:

Hi Michal,

Thanks.

On 12/10/2021 10:42, Michal MÄchura wrote:

We have no way to represent fine-grained morphosyntactic information, including gender of nouns and categories of inflections. For example the annotation on this entry and its inflected form âboiseâ

For gender (and other properties) of nouns (and other word classes) the idea is that implementers would conflate them with part-of-speech labels: they would create a âmasculine nounâ part-of-speech label, a âfeminine nounâ part-of-speech label, and so on.

For inflection categories, for example to say that this noun belongs in such-and-such inflectional paradigm, there my suggestion would again be to conflate this with the part-of-speech label, for example âfeminine noun of the second declensionâ.

Note that in DMLex, implementers can use the Controlled Vocabularies module to tell the world what their home-baked part-of-speech labels map to in some external data category ontology, for example to LexInfo.

For inflected forms, thatâs what the InflectedForm object type is for (but no tildas please).

Okay, I don't like all this 'conflating', it seems like a hack to me that is not really necessary.

We cannot give citations for sources of information, as is typical in historical dictionaries (see image)

True. This would be a good candidate for a module.

There is no etymology information (as discussed in the call). See example in Merriam-Webster:

True again, and again itâs a candidate for a module.

Okay, I will try to make some pull requests for candidate modules.

I am not sure how we intend to implement collocations (see example in FoclÃir Gaeilge-BÃarla)

My idea was that each collocation would be treated as an entry (with the full-form collocation as its headword, for example âlÃn boiseâ), and this entry would be connected to its âmotherâ entry through SubentryRelation.

To my mind, this is the only reasonable way to handle collocations if we want to avoid embedded subentries and headword overriding (see my report on subentrying earlier in Lexidma and also my presentation on recursion at eLex recently).

Yeah, but as you see we often get full sentences as 'collocations' and in many dictionaries the definitions rely significantly on context. I think the solution above may work, but we should certainly try it on some real dictionary data to see how practical it is.

No modelling for hypernym relations

I was thinking most lexicographers would be happy enough to model hypernymy and hyponymy through SimilarityRelation in the Crossreferencing module. But, by all means, feel free to propose a more detailed inventory of relation types for this module if you want.

I think that the distinction between synonyms and hypernyms is quite important and few dictionaries would like to conflate these. A simple solution would be to just have a single Relation object and use the controlled vocabulary model to define the types of relations?

On a general note, Iâm sure people from both inside and outside Lexidma will have a lot of questions like these; how do I handle nound gender in DMLex, how do I handle collocations in DMLex, and so on. The standard itself is probably not the right place to answer them. So, perhaps we should think about producing some sort of a âcookbookâ to go with the standard as an additional, less formal guide to implementing DMLex.

I agree. Examples will need to be documented somewhere.

Regards,

John

M.

lexidma message