OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

lexidma message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [lexidma] Working draft



On 12/10/2021 16:12, Michal MÄchura wrote:

Re: conflating gender etc, into part of speech

Okay, I donât like all this âconflatingâ, it seems like a hack to me that is not really necessary.

Another option would be to have a more generic âlabelâ type and (optionally) use a controlled vocabulary to tell us what kind of label it is: part of speech or gender or whatever. That would still allow conflating but would also allow ânot-conflatingâ.

In fact, our sense-level usage labels do this already, so for consistency we should probably do it for our entry-level grammar labels too.

I would recommend a key/value style representation of linguistic properties. This is the approach taken in all other models and would connect with other controlled vocabulary mechanisms (CLARIN, GOLD, LexInfo etc.)

Re: collocations

Yeah, but as you see we often get full sentences as âcollocationsâ and in many dictionaries the definitions rely significantly on context.

That Irish-English dictonary youâre looking at (Ã DÃnaillâs FGB from 1977) is exceptional in that it makes no clear distinction between collocations and example sentences and other kinds of multiword units that appear inside the articles. The bold/italic type is not a consistent guide either. I know this dictionary well, I had a hand in its retrodigitization a while back. :-) If somebody wanted to convert FGB into DMLex I would advise them to either treat all those things as example sentences, or else to manually decide for each one whether or not it deserves to be its own (sub)entry. And either way, this dictionary is on the periphery of Lexidmaâs interests because itâs an old paper one. I get the impression that modern born-digital dictionaries tend to be more clear on whether something is or isnât a (sub)entry.

I wouldn't say that it is just FGB. I think the practice is very common in bilingual dictionaries. For example, Collins Italian-English dictionary* has many full phrases. I was looking at the print version but the online one is similar:

https://www.collinsdictionary.com/dictionary/italian-english/questo

* The first bilingual dictionary I could find on my shelf :)


A simple solution would be to just have a single Relation object and use the controlled vocabulary model to define the types of relations?

Thatâs doable, yes. But still, I think we need separate Relation subtypes depending on their arity and directionality. Like, a synonymy relation can have two or more participants and is undirected, an antonymy relation must have exactly two participants and is also undirected, a hyperym/hyponym relation has two participants and is directed.

I see. What is the reason to care about symmetry and arity? It only seems useful if you are going to add some kind of reasoning or validation methodology, which would add many more complexities to the model.

Regards,

John

M.


Ãt 12. 10. 2021 vÂ14:48 odesÃlatel John McCrae <john.mccrae@insight-centre.org> napsal:

Hi Michal,

Thanks.

On 12/10/2021 10:42, Michal MÄchura wrote:

We have no way to represent fine-grained morphosyntactic information, including gender of nouns and categories of inflections. For example the annotation on this entry and its inflected form âboiseâ

For gender (and other properties) of nouns (and other word classes) the idea is that implementers would conflate them with part-of-speech labels: they would create a âmasculine nounâ part-of-speech label, a âfeminine nounâ part-of-speech label, and so on.

For inflection categories, for example to say that this noun belongs in such-and-such inflectional paradigm, there my suggestion would again be to conflate this with the part-of-speech label, for example âfeminine noun of the second declensionâ.

Note that in DMLex, implementers can use the Controlled Vocabularies module to tell the world what their home-baked part-of-speech labels map to in some external data category ontology, for example to LexInfo.

For inflected forms, thatâs what the InflectedForm object type is for (but no tildas please).

Okay, I don't like all this 'conflating', it seems like a hack to me that is not really necessary.

We cannot give citations for sources of information, as is typical in historical dictionaries (see image)

True. This would be a good candidate for a module.

There is no etymology information (as discussed in the call). See example in Merriam-Webster:

True again, and again itâs a candidate for a module.

Okay, I will try to make some pull requests for candidate modules.

I am not sure how we intend to implement collocations (see example in FoclÃir Gaeilge-BÃarla)

My idea was that each collocation would be treated as an entry (with the full-form collocation as its headword, for example âlÃn boiseâ), and this entry would be connected to its âmotherâ entry through SubentryRelation.

To my mind, this is the only reasonable way to handle collocations if we want to avoid embedded subentries and headword overriding (see my report on subentrying earlier in Lexidma and also my presentation on recursion at eLex recently).

Yeah, but as you see we often get full sentences as 'collocations' and in many dictionaries the definitions rely significantly on context. I think the solution above may work, but we should certainly try it on some real dictionary data to see how practical it is.

No modelling for hypernym relations

I was thinking most lexicographers would be happy enough to model hypernymy and hyponymy through SimilarityRelation in the Crossreferencing module. But, by all means, feel free to propose a more detailed inventory of relation types for this module if you want.

I think that the distinction between synonyms and hypernyms is quite important and few dictionaries would like to conflate these. A simple solution would be to just have a single Relation object and use the controlled vocabulary model to define the types of relations?


On a general note, Iâm sure people from both inside and outside Lexidma will have a lot of questions like these; how do I handle nound gender in DMLex, how do I handle collocations in DMLex, and so on. The standard itself is probably not the right place to answer them. So, perhaps we should think about producing some sort of a âcookbookâ to go with the standard as an additional, less formal guide to implementing DMLex.

I agree. Examples will need to be documented somewhere.

Regards,

John

M.



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]