Implementing LIOM/JLIFF

Hi all,

I found that defining JLIFF by looking at XLIFF examples leads to two biases: a) being XML and b) being already one step remote from the live objects.

For example, taking the version 0.9.3 of the schema: The subunit can be a segment or an ignorable. So it has a “type” field with either the value “segment” or “ignorable”. In an API the practical thing would be the interface representing the subunit to have a method “isSegment()” returning a boolean. So it would make more sense to have the JLIFF subunit have a field “isSegment” with true or false.

Having the UML model helps a bit, but, at least for me, it still has the danger forcing the API to accommodate the data (which are driven by XLIFF).

It seems it would be better to do things the other way around: create the API/OM and derive the serialization from that.

We have two OM/API (Microsoft’s and Okapi’s) but they do not take into account some of the ideas/naming used more recently. They are also based on XLIFF rather than being more “neutral”, and the Okapi implementation is mostly made for stream processing rather than a “DOM”-like structure like we are discussing here.

So I’ve started to implement both the API/OM and the JLIFF output at the same time so they can “validate” each other.

The hard part is to try to decouple the API from the implementation. For example, we should have an API that does not presume of how the implementation deals with inline code (It could be done with lists of objects, with a string and associated offsets, or with a coded string mixing both text and references to tags). The API should offer just basic access to the tags and not assume much more. The same goes for all the other objects.

It is now advanced enough that we can create fairly real document and get their JLIFF output.

For example, this code create a document and one sub-document (i.e. <file>) with one group containing two units:

// Create the document

IDocument doc = Factory.SI.createDocument().setSrcLang("en").setTrgLang("fy");

// Create a sub-document and a group

IGroup group = doc.addSubDocument("f1").addGroup("g1");

// Create a first unit and its source and target

ISegment seg = group.addUnit("u1").addSegment();

seg.getSource().append("Summer is coming.");

seg.setState(TargetState.TRANSLATED) // Set the target as 'translated'

.getTarget(IfNoTarget.CREATE_EMPTY).append("Simmer komt deroan.");

// Create a second unit with source (two segments and one ignorable)

IUnit unit = group.addUnit("u2");

unit.addSegment().getSource().append("Summer will be hot.");

unit.addIgnorable().getSource().append(' ');

unit.addSegment().getSource().append("But I will be at the beach.");

// Output it in JLIFF

Formatter fmt = new Formatter();

fmt.process(doc);

System.out.println(fmt.makePretty(fmt.getOutput()));

Generates this JLIFF:

{

"version": "1.0",

"srcLang": "en",

"trgLang": "fy",

"subDocuments": [

{

"id": "f1",

"translate": true,

"canResegment": true,

"preserveWS": false,

"srcDir": "auto",

"trgDir": "auto",

"groupsOrUnits": [

{

"isUnit": false,

"groupsOrUnits": [

{

"isUnit": true,

"subunits": [

{

"isSegment": true,

"state": "translated",

"source": [ { "text": "Summer is coming." } ],

"target": [ { "text": "Simmer komt deroan." } ]

}

]

{

"isUnit": true,

"subunits": [

{

"isSegment": true,

"source": [ { "text": "Summer will be hot." } ]

{

"isSegment": false,

"source": [ { "text": " " } ]

{

"isSegment": true,

"source": [ { "text": "But I will be at the beach." } ]

}

]

}

]

}

]

}

]

}

There is still a lot to do: the inline codes that are not completely implemented yet, nor are the modules and the customer extensions. There is also a lot of implementation choices to make. But a lot of the general mechanism should be in place.

The code is here: https://github.com/ysavourel/liom

I hope this can be used to check and validate JLIFF against LIOM, but also allow us to make progress on both the API and the serialization at the same time.

Cheers,

-yves

Confidentiality Notice
The information in this transmittal may be privileged and confidential and is intended only for the recipient(s) listed above. Any review, use, disclosure, distribution or copying of this transmittal, in any form, is prohibited except by or on behalf of the intended recipient. If you have received this transmittal in error, please notify me immediately by reply email and destroy all copies of the transmittal.

xliff-omos message