This might be at a tangent to the metadata debate, but I hope will be useful.
A few years ago, I had a contract with Smartlogic, and wrote the User Manual for their then product, Ontology Manager. One issue I noticed is that information scientists appear to be quite bad at defining their own industry terms. One person's taxonomy is another's thesaurus is another's ontology. For this reason, I wrote a brief primer to define these terms, at least for the benefit of the manual. I could let folks have a copy, if that would be useful? Please contact me off list.
My next contract was with Elekta. They use Simplified Technical English - Wikipedia due to the volume of translation. STE uses a dictionary to significantly restrict the words used in the documentation. The aerospace industry uses STE for safety reasons.
Recently, I pondered what happens when a website serves up a corpus written with STE. What happens to search results, for instance? I have not seen any research into this, however, it stands to reason that the results would be either excellent or dismal. If the user happens to search for a STE preferred term, then the results will be excellent. If they happen to search for a STE non-preferred term, then the results will be dismal.
A prime application of ontologies is to create a website index that significantly improves search results. Therefore, a company that uses STE might well find itself with the need to also use an ontology.
Dictionaries define words whilst taxonomies, thesauri and ontologies classify them. So, there are fundamental differences. However, both dictionaries and ontologies use preferred and non-preferred terms. So there is also a strong link between them.
They also share one significant issue: they both require continual maintenance. Unfortunately, some companies adopt STE or an ontology search index, and assume it is a one off exercise to create the dictionary or ontology. It is definitely not! Language is not static, and nor are company products and services. I was told that an ontology that is three months out of date is virtually useless, and they might as well start over.
So, if a company does require both a STE dictionary and an ontology, then it makes sense to combine them in some way and halve the maintenance overhead. Search terms harvested from the website via the search engine into the ontology could also provide useful feedback to the STE dictionary.
Now, OK, it might be a huge ask for DITA 2.0 to support dictionaries, taxonomies, thesauri AND ontologies. But the point is that any development that supports one of these should not adversely affect any future development of any other.
Many thanks, David
Have gone through a series of emails and TC meeting minutes on the subject of our recent discussions around metadata and DITA, and to summarize things:
Key Issues/Observations Identified on Metadata Usage with DITA:
- Perceived limitations to how DITA can work with external taxonomy standards
- A preference in the community for wanting to use attributes rather than elements
- Current inability to use a URI in an attribute
- While subjectScheme is designed for use with taxonomies, but is deficient as currently implemented (comment from Kris that subjectScheme was underspecified in DITA 1.2, and backwards compatibility issues limited what was possible to do in DITA 1.3)
Current Suggestions for DITA 2.0: • extend SubjectScheme so that it is possible to state that “this is my enumeration value, different from my key name” (Eliot Kimber) This could be done by adding a new enumeration-value element for use within subjectdef element to store a unique ID value alongside the key and readable value (Joe Pairman) • @props whose value allows URIs; maybe a specialization-based @ whose value is a URI (Eliot Kimber); alternately, create a new, global metadata-specific attribute (“@metadata”? “@taxonomy”?) that could take on this role (Joe Pairman) • Create a semantic mapping mechanism to pair the names of DITA elements (specialized or not) with data in an equivalent, external standard or mechanism (Joe Pairman)
Where is this request coming from? Some DITA practitioners at recent DITA Listening sessions are asking for "better metadata support" within DITA. Reasons are scattered, but include requests for a more "associative metadata model in order to apply it in bulk after the content has been published" (using third-party tools).
A Possible Role of RDFa? At the TC Meeting of November 14, RDFa was suggested, and while it was agreed that it could play a role, it was generally agreed that it should a) not be incorporated into core DITA, but instead as a specialization, and b) RDFa usage is on the decline. If there was sufficient interest, a Working Group could be struck to devise a specialization. (This was not a specific motion, and this has not come to pass).
Detailed Timeline (courtesy of Joe Pairman): - 2017-10-23: Email (https://lists.oasis-open.org/archives/dita/201710/msg00038.html) from Scott Hudson suggesting simply adding RDFa (lite?) support to DITA to solve metadata problems at one swoop including possible replacement for subjectScheme.
- 2017-10-24 Email (https://lists.oasis-open.org/archives/dita/201710/msg00044.html) from Joe Pairman urging caution regarding wholesale RDFa incorporation due to adoption issues and declining external interest in RDFa. Ended by saying that it could make sense but needs careful thought.
- 2017-11-08: Email (https://lists.oasis-open.org/archives/dita/201711/msg00014.html) from Eliot stressing that domain- or organization-specific metadata stuff should not burden the core of DITA and instead people should be encouraged to make use of, for example, the specialization mechanism. Perhaps attribute model could be improved in DITA 2.0 in response to Parson’s points that attributes are more universal. But Eliot did not support literal inclusion of RDFa in the core, and suggested that any such inclusion should be by means of a plugin.
- 2017-11-14 Email (https://lists.oasis-open.org/archives/dita/201711/msg00013.html) from Kris asking how best to fruitfully discuss the whole area of metadata.
- 2017-11-14: Email (https://lists.oasis-open.org/archives/dita/201711/msg00014.html) from Joe Pairman pointing out some gaps in DITA’s metadata support and ease of adoption, particularly around controlled values, proper URIs, and inline annotation
- 2017-11-14 TC meeting (https://lists.oasis-open.org/archives/dita/201711/msg00017.html) where we discussed the points from the emails up to that point, and Eliot suggested some small tweaks to the spec to address them. (I detailed these points and added followup questions in an email on 2017-11-28; see below)
- 2017-11-15: Email (https://lists.oasis-open.org/archives/dita/201711/msg00022.html) from Jim Tivy stressing that we need good use cases, including specifics on where topicmeta, indexterm, and subjectScheme are failing us
- 2017-11-16: Email (https://lists.oasis-open.org/archives/dita/201711/msg00023.html) from Joe Pairman agreeing with Jim, emphasizing the need for inclusion, and suggesting an expanded version of the notes I was writing to sum up the points so far
- 2017-11-28: Email (https://lists.oasis-open.org/archives/dita/201711/msg00034.html) from Joe Pairman: quite a lengthy one that summarized the Linked Data and SKOS use cases, looked at two particular aspects of metadata use, dove into sum of the detailed issues, described Eliot’s suggestions, raised some further questions regarding those suggestions, and introduced an important new use case regarding understanding DITA semantics in the context of external schemas!
- 2017-12-04: Email (https://lists.oasis-open.org/archives/dita/201712/msg00008.html) from Joe Pairman summarizing the previous email, linking to a post he wrote that might help ease people into the discussion, and suggesting a couple of concrete ways forward.
Cheers!
-
Keith Schengili-Roberts Market Researcher and DITA Evangelist IXIASOFT 825 Querbes, Suite 200, Montréal, Québec, Canada, H2V 3X1
|