OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

xliff-users message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: XLIFF Object Model and API


[[
To non-aware xliff-users recipients: A discussion about defining a common data model/API for XLIFF has been started at the last
FEISGILTT conference in Dublin earlier this month. There are people from various groups interested in the topic, so we are moving
the thread here so it can have a home.
]]

-yves

And here is the last message from that thread:


-----Original Message-----
Subject: RE: XLIFF API in WebIDL

Adding Martin to the thread, and forwarding his observations (with his permission to share):

-----Original Message-----
From: Martin Wunderlich
Sent: Monday, June 23, 2014 11:34 AM
To: Schnabel, Bryan S
Subject: Re: XLIFF API in WebIDL

Hi Bryan, 

Thanks a lot for forwarding this to me. I can see that the discussion has already progressed quite a bit and Yves is diving into the
implementation details. 
This could perhaps be complemented by a discussion about the typical use cases for such an API. As mentioned before, there could be
two areas where the standardized API might be useful - plus a meta-layer covering generic stuff. 
Let me try and add some meat to the idea by listing some specific methods: 

0) Meta-layer
- authentication; login/logout/refreshSession
- management of users, groups and permissions
- getSystemInfo()

1) Job/project oriented aspect
- createJob(XLIFF file)
- getJobQuote(XLIFF file)
- acceptJobQuote(ID)
- queryJobStatus(ID)
- cancelJob(ID)
- queryJobTypes()
- addFileToJob(ID, fileType)
(if possible, without file size restriction; e.g. for large video files that can range in the GBs). 

2) Linguistic data aspect
- translationMemory CRUD
- termBase CRUD
(both based on resource ID and segment ID; it should be possible to work on both individual segments/terms and batches of
segments/terms to keep the transaction volume down).  

This is just a rough initial sketch, somewhat based on work published by others here: 
https://www.oasis-open.org/committees/tc_home.php?wg_abbrev=trans-ws 
https://labs.taus.net/interoperability/taus-translation-api 
http://www.dercom.de/en/projects 

Many open questions: 
- Should there be  workflow-oriented API calls, e.g. to move WFs to the next step, specify branching, accept/reject tasks? This
could be handy for a tight integration of specialized system, but it could also blow up the API specs quite a bit. 
- How to handle errors and timeouts
- Synchronous vs. synch calls - should the API specs prescribe one or the other? 
- Exact format and number of parameters.  

As regards the implementation, I agree with Yves that this should run in parallel with the development of the API standard. It will
also help adoption, if a mature open-source library is available upon publication (under a commercial-friendly license, such as
Apache or Eclipse; GPL would hinder adoption). 

These are just some thoughts jotted down after a working day, but I hope it contributes a bit to the on-going discussion.

Cheers, 

Martin

-----Original Message-----
From: Yves Savourel [mailto:ysavourel@enlaso.com] 
Sent: Tuesday, June 24, 2014 4:27 AM
To: 'Ryan King'; xliff@lists.oasis-open.org
Cc: 'Felix Sasaki'; 'Dave Lewis'; 'Alan Melby'; Schnabel, Bryan S; 'Kevin O'Donnell'; 'Fredrik Liden'; 'Chase Tingley'; 'Dr. David
Filip'
Subject: RE: XLIFF API in WebIDL

Hi all,

To continue on what Ryan was saying, one note I'd like to make is that the serialization may be a bit flexible:

In some cases we may have two options for the serialization: A first one that matches exactly the object model and that is can be
mapped seamlessly into the model. And a second one that is more object-independent but still easily parsed into any object model
close to the one we would have chosen.

An example is the content of <source> like this one:

<originalData>
 <data id='d1'>&lt;b></data>
 <data id='d2'>&lt;/b></data>
 <data id='d3'>&lt;br></data>
</originalData>
...
<source>Text in <pc id="1" dataRefStart="d1" dataRefEnd="d2">bold</pc> format.<ph id="2" dataRef="d3"/></source>


Imagine the object model we choose is some variation of the option c) I was mentioning before (string with special characters
pointing to the inline objects). That would give us something like the following JSON output, where we have a coded string and a
collection of objects to store the codes' data. (I've simplified the output by omitting any field that is set to its default):

{
   "src":{
      "text":"Text in\uE101\uE110bold\uE102\uE110 format.\uE103\uE110",
      "tags":[
         {
            "kind":"sc",
            "id":"1",
            "sdat":"<b>"
         },
         {
            "kind":"ec",
            "id":"1",
            "sdat":"<\/b>"
         },
         {
            "kind":"ph",
            "id":"2",
            "sdat":"<br>"
         }
      ]
   }
}

The almost same data can be represented using a very similar format, but abstract things a bit more, not assuming how the relation
between the text and the inline codes is implemented, but simply listing the parts:

{
   "src":[
      "text":"Text in",
      "tag":
         {
            "kind":"sc",
            "id":"1",
            "sdat":"<b>"
         },
      "text":"bold",
      "tag":
         {
            "kind":"ec",
            "id":"1",
            "sdat":"<\/b>"
         },
      "text":" format.",
      "tag":
         {
            "kind":"ph",
            "id":"2",
            "sdat":"<br>"
         }
      ]
   }
}

Such second representation could be easily fed into any object model for the <source>, and still be mapped without much trouble into
the version c). 

The point here, is that the serialization does not have necessarily to match exactly the OM and API. The advantage is that such
representation may be useable by more applications because it does not completely force a specific object.
The drawback is that it requires some minor coding for everyone, while the first representation has virtually no coding for the
applications implementing the OM/API but a bit more for everyone else.

In any case, as Ryan noted, serialization can be looked at last.

I'm still working on a basic description for the inline content model. Hopefully I'll post it within a couple of days.

Cheers,
-ys







[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]