OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

xliff-comment message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Semantic information for XLIFF placeholders via W3C ITS Text Analysis


Hi,

I recently saw Steven Loomisâ XLIFF 2.x feature-related post https://wiki.oasis-open.org/xliff/XLIFF2.0/Feature/SemanticDomainForPlacehoders: "semantic information about what a placeholder means - beyond human readable".

From my understanding, it could make sense to investigate this with a view towards Text Analysis, a W3C ITS data category to capture semantic/ontological information (see https://www.w3.org/TR/its20/#textanalysis).

I will give it a try - hoping that experts will show mercy in case of inaccuracies or errors ð

Let's start with the native format from which XLIFF is derived. Say it is HTML5 that has been annotated via a process for entity recognition/linking. 

In the example below, the HTML5 has been annotated "locally/inline" by the fictious tool "http://annotater.org"; as follows:

a. The string "Boulder" has been annotated as being of type/class/domain "location" according to a certain ontology for Named Entity Recognition and Disambiguation
b. The string "Boulder" has been annotated as being an entity/instance/item recorded in Wikidata

With W3C ITS Text Analysis this information could be captured as follows
 
<!DOCTYPE html>
<html lang="en"
      its-annotators-ref="text-analysis|http://annotator.org";>
      <head>
             <meta charset="utf-8" />
      </head>
      <body>
             <p>
                   <span its-ta-confidence="0.7"
                          its-ta-class-ref="http://nerd.eurecom.fr/ontology#Location";
                          its-ta-ident-ref="https://www.wikidata.org/wiki/Q192517";>Boulder</span>
                   is located at the foothills of the Rocky Mountains.
             </p>
      </body>
</html>
 
In XLIFF, that may show up as (example was generated with version 38 of Okapi Rainbow (see https://okapiframework.org/)):
 
<trans-unit id="2"
      its:annotatorsRef="text-analysis|http://annotator.org";>
      <source xml:lang="en-US">
             <g id="1">
                   <mrk its:taClassRef="http://nerd.eurecom.fr/ontology#Location";
                         its:taConfidence="0.7"
                         its:taIdentRef="https://www.wikidata.org/wiki/Q192517"; mtype="phrase">Boulder
                   </mrk>
             </g>
	is located at the foothills of the Rocky Mountains.</source>
</trans-unit>

Best regards,
Christian


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]