OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

lexidma message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]

Subject: Re: [lexidma] Recent edits


Michal, thanks for the new version of the specs. I now made a stab at the XML schemas, and, while developing that, wrote some comments in the specs, attached - I hope you can read the bubbles. What I am mostly missing in the PDFs of some of the extra modules are examples of use (like what you have in the Core) and esp. the conditions on the properties and children (repeatable or not? optional or obligatory?).

I am also worried that no absolutely no metadata can be associated with a lex. resource. I know this was discussed, and you didn't want to open this can of worms, but I am sure users will be upset that they cannot give any metadata to the resource at all. Maybe at least an URI?

I put the current RelaxNG XML schemas on GitHub - I created a pull request, but until that is accepted, you can find them in my fork at https://github.com/TomazErjavec/lexidma or, more precisely, at https://github.com/TomazErjavec/lexidma/tree/master/dmlex-v1.0/specification/serializations/XML.

I have tested them with some example files which mostly contain the examples from the specs, except for the multilingual module, which has no examples. The greatest headache was trying to implement the division into modules; RelaxNG does support modules but not really in the way that DMLex implements them - a few things I had to do it differently than the specs, in particular that objects get ids already in the core. At least for the XML serialization, it would would be much simpler if core, linking and in-line were just one schema - after all, if people don't need certain elements, well, then they just don't use them.

Anyway, it would be of course nice if you all had a look at it, but esp. Michal: I think the action (i.e. should now move to this serialization because you can validate the samples against the schema, something you can't do in MD + PDF. And it shows in the current specs with various inconsistencies; not saying that I could ever do better by just thinking about the model. We also have Git to communicate , rather than mails, which will be a lot more tractable. I hope that is ok, if not, then I would need for any further modifications of the specs also a "diff", i.e. what exactly has been changed to appropriately modify the schemas. What I would not do is having to always reread the complete specifications and try and figure out what has changed and how.

But we can discuss all this on Monday of course.

All the best,


On 2. 05. 2022 00:53, Michal MÄchura wrote:
Hi all,

I'm sending you a preview of the edits I have made to our draft, based in the various bits of feedback I have received. Some of the modules have undergone a major rewrite, so please read this carefully. The major changes are:

- I have cleaned up the names of datatypes and properties mentioned throughout the text, removed inconsistencies, and even changed the names of some of them.

- I have simplified how we handle controlled vocabularies/look-up values in the Core. Instead of having many separate datatypes for the different kinds of controlled vocabularies, we now only have one datatype (called Tag). The fact that some look-up values apply to part-of-speech labels, some apply to inflection labels and so on, is represented in DMLex as business rules which implementors can choose to enforce or not to enforce. I have come to the conclusion, after thinking about it long and hard, that it's better to do it like this because we will end up having a simpler, easier-to-understand, easier to implement object model with fewer types, which is exactly what software developers like to see!

- The Linking Module has also been simplified in a similar fashion. Instead of having many datatypes for many different kinds of relations based on their arity and directionality (SenseSet, SensePair, SenseTuple...) we now only one datatype (called Relation). Constraints on the number and types of the objects that are participating in these relations are expressed in DMLex as business rules which impementors can choose to enforce or not. Again, my motivation for this change is simplicity, where by simplicity I mainly mean having as few datatypes as possible â even i it means that some constraints now have to be represented as business rules instead of being baked right into the datatype system.

- I have become convinced that, in addition to monolingual and bilingual dictionaries, we need to support multilingual ones as well, where there is one source language and multiple target languages. So I have split the Bilingual Module into two, a Bilingual Module and a Multilingual Module. These two modules are mutually exclusive: impementors can implement one or the other but not both.

I hope you will be broadly in agreement with these changes. I apologize to those who are working on serializations: some of these changes may be game changers for you, depending on how far advanced you are.

The files I am sending you are PDFs, generated from Markdown files I am using internally for writing (because writing straight into DocBook is too difficult for me). I am going tol convert it all into DocBook and submit it as a GitHub pull request in time for the next meeting.


To unsubscribe from this mail list, you must leave the OASIS TC that
generates this mail.  Follow this link to all your TCs in OASIS at:

Attachment: 03-bilingual.pdf
Description: Adobe PDF document

Attachment: 04-multilingual.pdf
Description: Adobe PDF document

Attachment: 05-linking.pdf
Description: Adobe PDF document

Attachment: 02-core.pdf
Description: Adobe PDF document

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]