OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

lexidma-comment message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Feedback from David Lindemann


This is feedback from David Lindemann, david.lindemann@ehu.eus, who works on Basque dictionaries at the University of the Basque Country. David didnât want to go through the motions of signing up for this mailing list etc. but he has given me permission to submit his feedback on his behalf and he has agreed to the OASIS Feedback License.

To start with, a question:

There are dictionaries that donât attach a POS value to the headword but have POS sections inside the entry (the headword is presented as POS neutral). For example, this entry: https://www.euskaltzaindia.eus/index.php?option=com_oehberria&task=bilaketa&Itemid=413&lang=eu-ES&query=aditu

In such a case, would you like to force a re-modeling of the inner entry hierarchy (in this case, make three entries out of one, so that each entry is not POS-ambiguous?) There are reasons for modeling a dict. like you see in the example. In Basque, for example, there are so many nominals that can be interpreted as nouns or adjectives, and the border is not clear. The above example, aditu, means âexpertâ, and also in English it is not that clear where it is an ADJ and where a NOUN (âI am a Basque expert / I am an expert Basqueâ). Another reason is that if you describe inflected forms (and there are a lot of forms for each lemma in Basque), you get very redundant if you have to list all possible forms in the entriesâ Related to that: We have frequency data for Basque word forms, but we are not able to say in each case if it is ADJ or NOUNâ, and also, if you have an inflected past participle, is this a verb form, or a nominal (inflection behaves like the one for nouns and adjectives)??..

In Ontolex-on-Wikibase, I am modeling that as follows: I introduce POS-disambiguating property at sense level (âthis sense applies to this lemma as nounâ), and I do the same for inflected forms, if it is clear what POS a certain form may have (can be more than one). Example: âadituâ with POS on senses,âadituâ with POS on forms (different sources / tools give different values here, which is what I want to record in that case)

Also in German, there are dictionaries that have such POS-like sections (not across POS, but refining POS). Some dictionaries group verb senses inside an element describing a syntactic entity (âverb transitiveâ vs. âverb intransitiveâ, âverb reflexiveâ, etc.) - example.

So the question is whether you would like such hierarchies or would like to enforce a re-modeling of the resource for getting it DMLex-compliant.

For information, here is a copy of the response I had given David:

There are (at least) two ways to model such situations in DMLex.

Way number one is to redefine the POS-specific senses (or sense groups) as separate entries, and connect them with a relation â along the same lines as the âwalkâ example in my unofficial introduction (figures 6b and 6c). This of course doesnât mean that they need to be shown to the end-user as two separate entries. The relation that links them could be understood as an instruction to your software to collate the entries into a single thing at display-time.

Way number two is to invent a new part-of-speech that covers both noun and adjective usages, perhaps giving it a creative name such as ânoun-adjective hybridâ or something. Then you can treat both usages in one entry. In addition to that, you could even use DMLexâs label objects to label noun-like senses and adjective-like senses if you like.

This may seem like a dirty workaround to satisfy the data model, but I donât see it that way. I see it as a realistic re-analysis of the grammatical categories even exist in your language. If this noun/adjective duality is something that happens often in the lexicon of your language, then thatâs a good argument in favour of postulating a category for these words, isnât it?

Iâve seen something similar in dictionaries of Welsh which operate with a POS label called âverbnounâ. And itâs not as makey-uppy as it sounds, Welsh has a long history of analysing its lexicon in these terms.

And English nouns like âexpertâ are not even that difficult to analyze, in my opinion. Here Iâd say you donât even need to invent any new parts of speech or to go for separate entries. Both uses of âexpertâ are nouns, I would argue. Even in a sentence like âIâm an expert linguistâ âexpertâ is a noun: a noun being used to modify another noun. Iâm aware that some people analyze it as an adjective but I think thatâs misleading. It doesnât have any of the usual properties that adjectives have: it cannot be graded (*âI am an experter lingust that youâ), and cannot be used predicatively (*âyour advice was very expertâ). So Iâd say itâs perfectly adequate to describe âexpertâ in a single entry, with POS label ânounâ. The adjective-like uses could be labelled with a DMLex label like âused to modify another nounâ or something.

So, anyway, these are my thoughts. Iâm not saying that sense-level parts of speech are invalid in any big theoretical sense. Itâs just that you can get serious fringe benefits from prohibiting them and from enforcing a strict separation of form and meaning. Thatâs the road DMLex has taken.



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]