Subject: RE: Generic Inline Markup - Input Highly Appreciated (draft)
Hi Richard and Felix,
A sub-committee of the XLIFF Technical Committee at OASIS (XLIFF TC) is currently working on what some have suggested to call “generic inline markup”. The purpose is to define a common semantic representation of native format constructs such as inline formatting, etc. in order to facilitate processing and reuse of linguistic content. It goes without saying, that mechanisms for dealing with sub-flows have to be taken into account (e.g. footnotes and other things touch on in the W3C Internationalization Tag Set (ITS); see http://www.w3.org/TR/its/#elements-within-text).
The motivation for this work – for which details exist at http://wiki.oasis-open.org/xliff/OneContentModel – are at least twofold:
A. Discrepancies in inline markup definitions between the XML Localization Interchange File Format (XLIFF) and the Translation Memory eXchange (TMX); example:
TMX uses only the encapsulation methods for inline codes (the native codes are enclosed within different elements), while XLIFF provides both the encapsulation method (using elements very similar to TMX's) and the placeholder method (where the native codes are moved to the Skeleton file and replaced by a short element that refers to them).
B. Shortcomings in the inline markup definitions of XLIFF; example:
The same native format construct can be represented in more than one way (see http://www.oasis-open.org/committees/xliff/faq.php#inlineFormatting).
The effects of both are problems and high efforts for implementers, challenges to interoperability, and limited possibilities for leveraging content.
The focus of the sub-committee is to define a common model for inline-level markup for localization, allowing task and tool agnostic resource exchange and processing (e.g. to re-use of translations across file formats and to facilitate common processing of localizable data across native file formats). The main implementation targets are XLIFF and TMX. You could however easily imagine a wider use (e.g. in the context of any new format in need of inline markup).
In addition to an abstract model, a markup model (represented in XML and defined by an XML schema) will be developed. There is no backward compatibility requirement with earlier versions of XLIFF and TMX, but a migration path from the previous version of these specifications is envisioned. Implementation details such as the use of XML namespaces will be decided during the development of this specification. An extensibility mechanism will also be defined.
A challenging dimension of the endeavor is the question how to preserve information on native format constructs. This information is needed amongst others when the native format needs to be recreated (e.g. during the “merge” phase of XLIFF-based processes). Several approaches for this have already been discussed (see http://wiki.oasis-open.org/xliff/OneContentModel/Comparison ). Especially due to your expertise with W3C technologies, I got the action item to inquire if you could provide some thoughts/input to this dimension of the discussion. My own feeling is “attachment approaches” that can be realized with XPath/XPointer (see the ITS mechanisms), or discussions you already had with respect to Ruby may provide guidance for the discussion on generic inline markup.
Thanks in advance for reading this.