lexidma message

Subject: Re: [lexidma] Recent edits

From: TomaÅ Erjavec <tomaz.erjavec@ijs.si>
To: lexidma@lists.oasis-open.org
Date: Sat, 7 May 2022 11:42:56 +0200

Hi,

Michal, thanks for the new version of the specs. I now made a stab atthe XML schemas, and, while developing that, wrote some comments in thespecs, attached - I hope you can read the bubbles. What I am mostlymissing in the PDFs of some of the extra modules are examples of use(like what you have in the Core) and esp. the conditions on theproperties and children (repeatable or not? optional or obligatory?).

I am also worried that no absolutely no metadata can be associated witha lex. resource. I know this was discussed, and you didn't want to openthis can of worms, but I am sure users will be upset that they cannotgive any metadata to the resource at all. Maybe at least an URI?

I put the current RelaxNG XML schemas on GitHub - I created a pullrequest, but until that is accepted, you can find them in my fork athttps://github.com/TomazErjavec/lexidma or, more precisely, athttps://github.com/TomazErjavec/lexidma/tree/master/dmlex-v1.0/specification/serializations/XML.

I have tested them with some example files which mostly contain theexamples from the specs, except for the multilingual module, which hasno examples. The greatest headache was trying to implement the divisioninto modules; RelaxNG does support modules but not really in the waythat DMLex implements them - a few things I had to do it differentlythan the specs, in particular that objects get ids already in the core.At least for the XML serialization, it would would be much simpler ifcore, linking and in-line were just one schema - after all, if peopledon't need certain elements, well, then they just don't use them.

Anyway, it would be of course nice if you all had a look at it, but esp.Michal: I think the action (i.e. should now move to this serializationbecause you can validate the samples against the schema, something youcan't do in MD + PDF. And it shows in the current specs with variousinconsistencies; not saying that I could ever do better by just thinkingabout the model. We also have Git to communicate , rather than mails,which will be a lot more tractable. I hope that is ok, if not, then Iwould need for any further modifications of the specs also a "diff",i.e. what exactly has been changed to appropriately modify the schemas.What I would not do is having to always reread the completespecifications and try and figure out what has changed and how.


But we can discuss all this on Monday of course.

All the best,

TomaÅ


On 2. 05. 2022 00:53, Michal MÄchura wrote:

Hi all,
I'm sending you a preview of the edits I have made to our draft, basedin the various bits of feedback I have received. Some of the moduleshave undergone a major rewrite, so please read this carefully. Themajor changes are:
- I have cleaned up the names of datatypes and properties mentionedthroughout the text, removed inconsistencies, and even changed thenames of some of them.
- I have simplified how we handle controlled vocabularies/look-upvalues in the Core. Instead of having many separate datatypes for thedifferent kinds of controlled vocabularies, we now only have onedatatype (called Tag). The fact that some look-up values apply topart-of-speech labels, some apply to inflection labels and so on, isrepresented in DMLex as business rules which implementors can chooseto enforce or not to enforce. I have come to the conclusion, afterthinking about it long and hard, that it's better to do it like thisbecause we will end up having a simpler, easier-to-understand, easierto implement object model with fewer types, which is exactly whatsoftware developers like to see!
- The Linking Module has also been simplified in a similar fashion.Instead of having many datatypes for many different kinds of relationsbased on their arity and directionality (SenseSet, SensePair,SenseTuple...) we now only one datatype (called Relation). Constraintson the number and types of the objects that are participating in theserelations are expressed in DMLex as business rules which impementorscan choose to enforce or not. Again, my motivation for this change issimplicity, where by simplicity I mainly mean having as few datatypesas possible â even i it means that some constraints now have to berepresented as business rules instead of being baked right into thedatatype system.
- I have become convinced that, in addition to monolingual andbilingual dictionaries, we need to support multilingual ones as well,where there is one source language and multiple target languages. So Ihave split the Bilingual Module into two, a Bilingual Module and aMultilingual Module. These two modules are mutually exclusive:impementors can implement one or the other but not both.
I hope you will be broadly in agreement with these changes. Iapologize to those who are working on serializations: some of thesechanges may be game changers for you, depending on how far advancedyou are.
The files I am sending you are PDFs, generated from Markdown files Iam using internally for writing (because writing straight into DocBookis too difficult for me). I am going tol convert it all into DocBookand submit it as a GitHub pull request in time for the next meeting.
M.

---------------------------------------------------------------------
To unsubscribe from this mail list, you must leave the OASIS TC that
generates this mail.  Follow this link to all your TCs in OASIS at:
https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php

Attachment: 03-bilingual.pdf
Description: Adobe PDF document

Attachment: 04-multilingual.pdf
Description: Adobe PDF document

Attachment: 05-linking.pdf
Description: Adobe PDF document

Attachment: 02-core.pdf
Description: Adobe PDF document

References:
- Recent edits
  - From: Michal MÄchura <462258@mail.muni.cz>