OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

lexidma-comment message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Feedback DMLex


Dear all,


Here's my initial feedback on DMLex. Disclaimer: I read Michal's informative article and briefly clicked through the full spec, but I'm sure I missed plenty. I'm a software developer, but have worked on a DWS and other linguistic software for many years. These are my personal opinions. Hopefully some of it's useful. I tend to be long-winded; sorry.

I think DMLex is very interesting, with many positive characteristics. Clearly a lot of thought has gone into it and a lot of work has been done to get the proposal to this point. In many ways, it looks quite flexible, which is great.


At the same time, I feel some details (mainly disallowing certain common structures) currently make it less easy to adopt than it could be. I understand the arguments being made, but feel that rather than being settled issues, these are still actively being explored by the community. Reasonable people hold different opinions and each chooses what they believe best fits their requirements.

I'll give two examples. First, subsenses. The way DMLex proposes to model these feels a bit unnatural to me, like it's fighting the domain it's modelling. As far as I understand, to a lexicographer, a subsense actually *is* a part of the main sense, so it feels natural to model it as contained within the main sense structure. Or alternatively, each sense could be modeled as a separate hierarchical object, with relations to define the tree structure between the entry and its main senses, as well as the main senses and their subsenses. The hybrid structure proposed for DMLex where (sub)senses are all in a flat list as part of the entry, with separate relations indicating their "real" structure feels somewhat awkward and roundabout to me, and an extra hurdle to deal with when processing entries. That's just my opinion, and I can see the arguments to the contrary as well, but standardizing by choosing one approach kind of kills any debate and risks making the standard more difficult to adopt.

Another example is variant headwords. I can see that sometimes you'd want these to be separate entries, but in other cases I feel putting them in the main entry would be simpler, especially if the variants will never actually be processed or displayed as separate entries, just shown as a list in the main entry. Again, this is only my opinion, but it would be nice if the standard was flexible enough to support both approaches, allowing each project to choose what works best for them.

I understand that increased flexibility comes at a cost when it comes to interoperability and developing tools, and the availability of good tools certainly helps adoption of a standard. Finding the right balance is tricky, but for better or worse, I would lean a bit more towards supporting more existing practices to try to get everyone on board.


I have a few other, more practical/technical questions, such as how one would effectively query a DMLex implementation across its hierarchical and relational structures, and how to efficiently prepare entries for presentation. I understand that DMLex is designed as an abstract data model, but I feel that seeing how it might be implemented would help to properly evaluate it. It would be great to have a proof of concept to try out at some point.

Probably an obvious statement, but the specification is large and complex, making it difficult to wrap your head around the whole thing. People new to DMLex likely will find it very challenging to understand and use it starting from just that document. I think it would be great if there were a separate ebook introducing each part, starting from real-life examples and only referring to the spec for the details. Comprehensive testing/validation tools would be a necessity as well to aid development and ensure conformance. I'm sure this had already crossed your mind.

All in all, I'm very interested to see where this goes and happy to contribute if I can.
Best,

Jan Niestadt.


Jan Niestadt
senior softwareontwikkelaar / senior software developer
+31 (0)71 527 2265 / kamer 2.09

/instituut voor de Nederlandse taal/
Rapenburg 61 / 2311 GJ / Leiden
Postbus 9515 / 2300 RA / Leiden
ivdnt.org




Denk je aan het milieu? Print alleen als het nodig is.
Aan dit bericht kunnen geen rechten worden ontleend.
Het bericht is alleen bestemd voor de geadresseerde.
Indien het bericht niet voor u is bestemd, verzoeken wij
u dit aan ons te melden en het bericht te verwijderen.

This message shall not constitute any obligations.
This message is intended solely for the addressee.
If you have received this message in error, please
inform us and delete the message.




[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]