Re: XLIFF API in WebIDL

Dr. David Filip

=======================

LRC | CNGL | LT-Web | CSIS

University of Limerick, Ireland

telephone: +353-6120-2781

cellphone: +353-86-0222-158

facsimile: +353-6120-2734

http://www.cngl.ie/profile/?i=452

mailto: david.filip@ul.ie

On Wed, Jun 25, 2014 at 12:21 AM, Schnabel, Bryan S <bryan.s.schnabel@tektronix.com> wrote:

Adding Martin to the thread, and forwarding his observations (with his permission to share):

-----Original Message-----
From: Martin Wunderlich [mailto:martin@wunderlich.com]
Sent: Monday, June 23, 2014 11:34 AM
To: Schnabel, Bryan S
Subject: Re: XLIFF API in WebIDL

Hi Bryan,

Thanks a lot for forwarding this to me. I can see that the discussion has already progressed quite a bit and Yves is diving into the implementation details.
This could perhaps be complemented by a discussion about the typical use cases for such an API. As mentioned before, there could be two areas where the standardized API might be useful - plus a meta-layer covering generic stuff.
Let me try and add some meat to the idea by listing some specific methods:

0) Meta-layer
- authentication; login/logout/refreshSession
- management of users, groups and permissions
- getSystemInfo()

1) Job/project oriented aspect
- createJob(XLIFF file)
- getJobQuote(XLIFF file)
- acceptJobQuote(ID)
- queryJobStatus(ID)
- cancelJob(ID)
- queryJobTypes()
- addFileToJob(ID, fileType)
(if possible, without file size restriction; e.g. for large video files that can range in the GBs).

2) Linguistic data aspect
- translationMemory CRUD
- termBase CRUD
(both based on resource ID and segment ID; it should be possible to work on both individual segments/terms and batches of segments/terms to keep the transaction volume down).

This is just a rough initial sketch, somewhat based on work published by others here:
https://www.oasis-open.org/committees/tc_home.php?wg_abbrev=trans-ws
https://labs.taus.net/interoperability/taus-translation-api
http://www.dercom.de/en/projects

Many open questions:
- Should there be workflow-oriented API calls, e.g. to move WFs to the next step, specify branching, accept/reject tasks? This could be handy for a tight integration of specialized system, but it could also blow up the API specs quite a bit.
- How to handle errors and timeouts
- Synchronous vs. synch calls - should the API specs prescribe one or the other?
- Exact format and number of parameters.

As regards the implementation, I agree with Yves that this should run in parallel with the development of the API standard. It will also help adoption, if a mature open-source library is available upon publication (under a commercial-friendly license, such as Apache or Eclipse; GPL would hinder adoption).

These are just some thoughts jotted down after a working day, but I hope it contributes a bit to the on-going discussion.

Cheers,

Martin

-----Original Message-----
From: Yves Savourel [mailto:ysavourel@enlaso.com]
Sent: Tuesday, June 24, 2014 4:27 AM
To: 'Ryan King'; xliff@lists.oasis-open.org
Cc: 'Felix Sasaki'; 'Dave Lewis'; 'Alan Melby'; Schnabel, Bryan S; 'Kevin O'Donnell'; 'Fredrik Liden'; 'Chase Tingley'; 'Dr. David Filip'
Subject: RE: XLIFF API in WebIDL

Hi all,

To continue on what Ryan was saying, one note I'd like to make is that the serialization may be a bit flexible:

In some cases we may have two options for the serialization: A first one that matches exactly the object model and that is can be mapped seamlessly into the model. And a second one that is more object-independent but still easily parsed into any object model close to the one we would have chosen.

An example is the content of <source> like this one:

<originalData>
<data id='d1'></data>
<data id='d2'></data>
<data id='d3'> </data>
</originalData>
...
<source>Text in <pc id="1" dataRefStart="d1" dataRefEnd="d2">bold</pc> format.<ph id="2" dataRef="d3"/></source>

Imagine the object model we choose is some variation of the option c) I was mentioning before (string with special characters pointing to the inline objects). That would give us something like the following JSON output, where we have a coded string and a collection of objects to store the codes' data. (I've simplified the output by omitting any field that is set to its default):

{
"src":{
"text":"Text in\uE101\uE110bold\uE102\uE110 format.\uE103\uE110",
"tags":[
{
"kind":"sc",
"id":"1",
"sdat":""
},
{
"kind":"ec",
"id":"1",
"sdat":"<\/b>"
},
{
"kind":"ph",
"id":"2",
"sdat":" "
}
]
}
}

The almost same data can be represented using a very similar format, but abstract things a bit more, not assuming how the relation between the text and the inline codes is implemented, but simply listing the parts:

{
"src":[
"text":"Text in",
"tag":
{
"kind":"sc",
"id":"1",
"sdat":""
},
"text":"bold",
"tag":
{
"kind":"ec",
"id":"1",
"sdat":"<\/b>"
},
"text":" format.",
"tag":
{
"kind":"ph",
"id":"2",
"sdat":" "
}
]
}
}

Such second representation could be easily fed into any object model for the <source>, and still be mapped without much trouble into the version c).

The point here, is that the serialization does not have necessarily to match exactly the OM and API. The advantage is that such representation may be useable by more applications because it does not completely force a specific object.
The drawback is that it requires some minor coding for everyone, while the first representation has virtually no coding for the applications implementing the OM/API but a bit more for everyone else.

In any case, as Ryan noted, serialization can be looked at last.

I'm still working on a basic description for the inline content model. Hopefully I'll post it within a couple of days.

Cheers,
-ys

xliff message