OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

lexidma message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]

Subject: Re: [lexidma] follow up on two-level senses

Hi John,

thanks for your reply, please see my comments:

On Fri, 3 Jul 2020 at 11:25, John P. McCrae <john@mccr.ae> wrote:

(2) there is very limited agreement on this similarity
Actually, I don't think this is merely similarity that is being encoded, in fact mostly these sense groupings are based on ideas of systematic polysemy (as introduced by authors such as Pustejovsky [1] and Buitelaar [2]) and complementary and contrastive senses (such as described by Weinreich [3]). TheseÂare real linguistic phenomenon and still motivate modern electronic lexicographic efforts [4].Â

Yes, I don't see that contradicting anything I've said. Perhaps a more appropriate term to be used from the computer science perspective is an equivalence relation here, but that doesn't matter and to ease understanding I stick to "similarity", however it is defined and whether it is seen as a binaryÂrelation or a real-valued metric.

(3) there are many possible way how this similarity can be defined and seen, allowing this means being closer to how language/word senses work

(4) the fact that it was encoded in a hierarchical way that only allows one-dimensional structure merely comes from the limits of a printed dictionary
I am not sure I agree with this... partly for the reasons stated above, but moreover, users do not want to use an electronic dictionary as some free-form graph structure. This is something that I have learnt from WordNet, that presenting the data as a flat text structure (e.g., https://en-word.net/) is more effective than through a graph diagram. As such, I think in both presentation and production of dictionaryÂcontent, hierarchical groupings are still very useful.Â

I totally agree with the first part: users prefer flat structures because they are far more easierÂto comprehend -- but that's a very valid argument against using any hierarchies, not against using just one hierarchy where many can be valid too.

(5) this alternative solution therefore enables all this, and much more, if needed, without introducing additional complexity.

I think that the labels generally could use a similar notation that David mentioned for PoS tagging, with prefix denoting type of label, e.g. "sensegroup:1" or "sensegroup:etymology1" and similar but that is to be discussed.
From a technical point of view, there are also disadvantages to this. You are still encoding hierarchical senses, but now you are doing it in a way that is harder to work with in XPath and many other technologies, which in turn makes it harder for data creators to verify consistency.Â

I would suggest that this is implemented as an optional sense grouping tag, e.g,

 <sense id="..."><defn></defn></sense>
 <sense id="..."><defn></defn></sense>

It would be great if we could avoid any kind of XML thinking in our discussions: we propose a data model. XML will be just one of several serializations for it and once we establish the model we will thenÂdiscuss the XML serialization,Âor even XML serializations (i.e. more than one).
Having said that, I think it is good to avoid any unnecessary nesting (senseGrp here) whenever it is possible. Typically this will make XPath queries shorter and easier to write and read. But again: I really don't think that choosing one particular query language of one particular serialization should influence the data model design.

Also, I would note that this discussion is only really about grouping senses. Grouping entries is more questionable but is often motivated by linguistic phenomena like derivation, grouping etymologically distinct forms of the same word (e.g., 'bank' can be first grouped into subentries based on its Germanic/Italian/French etymologies) or morphologically distinct forms (e.g., the unique dative singular found in the seventh sense here). We should at least consider these requirements on the representation and have a plan to represent them in the model.

Yeah -- from this perspective I think the proposed approach can be easily generalized for these purposes too.

Best regards

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]