xliff-users message

Subject: XLIFF OM/API

From: Yves Savourel <ysavourel@enlaso.com>
To: <xliff-users@lists.oasis-open.org>
Date: Thu, 26 Jun 2014 13:55:07 -0600

Hi all,

(BTW: I'll stop CCing individuals after this email).

As Dave suggested a couple of weeks ago, I've put together a possible initial example of an object model and API.

Note that it focuses only on a very small aspect: the inline content (the content of <source> and <target>). So we're far from
dealing with the full document, or orchestrated interactions between TMS and clients.

But I believe it's a good starting point because such model (or related parts of it like the JSON serialization) can be small
building blocks useable in many contexts. It can be for example used in the TAUS Translation API, in MT queries, in text analysis,
etc. In any services where the payload is a segment or some kind of extracted text.

The document is both attached as a zip file (for the archives) and available online at http://opentag.com/data/xliffomapi/
I will keep updating the document.

Roughly described:

An IContent object is a string with each inline tag represented by a pair of special characters: a tag reference. That tag reference
points to a ITag object that can be either an ICode object (for <pc>/<sc>/<ec>/<ph>) or an IAnnotation object (for <mrk>/<sm>/<em>).

- For now the model and API are described using Javadoc: It's just a lot simpler and faster for me to write it and test it that way.
Note that a lot is still un-documented (especially the set/get), but hopefully an XLIFF-knowledgeable person can guess most of the
methods.

- You can perform most traditional string-based operations without affecting the tag reference because they are in the PUA range, so
methods like toUppercase() for example, will not affect them. Using the PUA range has obviously its own limitations and drawbacks.

- You can use regular expressions with those coded strings because the tag reference patterns are predictable and can even be
expressed as a regex group and be taken into account in your regex operations.

- The interfaces need many more methods, especially with regard to accessing the tags only for a given content (the store of tags is
held at the unit level), and many choices need to be made, for example when do we need to throw exception rather than return a null,
etc.

- Dealing with extensions and modules will be a nasty business. It is not addressed there yet. One problem for example is how a
library implementing only some modules in vN will be able to keep backward compatibility when its vN+1 version supports more
modules: Accessing the new module from both the extension mechanism (in vN) and the module-specific methods (in vN+1) may be quite
difficult to achieve. Something to think about.

- One important question is also how far the object model/API should go. Should it be limited to access easily a document and
perform simple modifications (essentially allowing you to easily map XLIFF to the developer's own object model), or should it offer
a wider range of methods to truly allow doing anything directly in that model? An example: one can have a function joinSegments() on
the unit to join a set of segments into a single one: that type of function is not really useful for doing just mapping, so it could
be omitted from the API. But if we want to offer more than allowing to do simple things, then we should have it.
The more powerful the API will be the more difficult it will be for developers to implement it though.

That's all for now.
-yves

Attachment: xliffomapi.zip
Description: Zip compressed data