Re: [lexidma] Module-by-module proposal

2/ I am not sure about allowing PartOfSpeech, Pronunciation and InflectedForm as children of Sense. Having senses of an entry with different part-of-speech values is something that some models explicitly avoid, we would also need to figure out how this inherits from the Entry's PartOfSpeech. I don't think we should have Pronunciation and InflectedForm at all, as senses with different pronunciations or inflections are homographs and we really should insist that homographs are distinct at the Entry level.

I am not sure of it either, it does smell of bad practice: sloppy separation between form and function. We could go the radical route and prohibit it altogether. On the other hand it is what lexicographers sometimes want to do. This always makes me think of the Czech word "jeÅÃb" 'crane which has two different plurals depending on whether it's the animal or the machine: I don't think lexicographers would welcome the idea of having to create two separate entries for these. This is not an exception, I could probably dig out other examples from oher languages. So the question is: do we want to force lexicographers to re-analyze all sense-specific morphosyntax as homonmy?

I would say so... but what do others think here?

If we go that way, then we wlll have a data model which makes dictionaries more easily machine-understandable (because of the clear separation between form and meaning) but less human-friendly.

As an alternative proposal, how about if we do have all morphosyntax at the entry-level (as you wish) but, in addition, we invent some relational mechanism for expressing the fact that some of the morphosyntactic properties (such as this plural or that pronunciation) only apply to some senses and not to others. Or would that make the data model even less machine-understandable instead of more?

3/ There seems to be no way to record properties of entries such as noun gender in the model.

That's what the Label object type would be for, in my proposal.

4/ Pronunciation probably needs a scheme and a variety property. See this recent paper (Sec 3.3) for a discussion of this: https://www.aclweb.org/anthology/2021.gwc-1.11.pdf

For scheme, probably, yes. For variety, if by that you mean things like 'British'Â versus 'American', I enviaged that, again, the Label object type would do that.

Okay... so label is very overloaded in terms of what it represents. My feeling is I would prefer some more specific categories for common annotations.

On second thought, I agree with you: we should have specific types for specific kinds of annotations, instead of a catch-all Label type.

Why am I changing my mind? Because it makes sense that all corners of the data model should be on the same level between specific and abstract. If we have Entry, Sense and SenseGroup instead of just one abstract "Segment" type (as I was proposing earlier), then by the same token we should have (for example) Register, Region and Time instead of just one abstract Label. The Segment and Label types are valid abstraction and belong in a meta-model, but not belong in DMLex because DMLex wants to be less "meta" and more immediately implementable.

Also, I wasn't even consistent with myself because PartOfSpeech is really just a Label too.

M.

lexidma message