xliff message

Subject: RE: XLIFF API in WebIDL
From: Yves Savourel <ysavourel@enlaso.com>
To: <xliff@lists.oasis-open.org>, "'Dr. David Filip'" <David.Filip@ul.ie>
Date: Sat, 21 Jun 2014 07:17:21 -0600
Hi David, all,

[[
To non-aware xliff-users: A discussion about defining a common data model/API for XLIFF has been started at the last FEISGILTT conference in Dublin earlier this month. There are people from various groups interested in the topic, so we are moving the thread here so it can have a home.
]]

> I'd say that we first need to define an object model in a 
> format independent of a specific serialization 
> (we need to go shopping for a syntax here, probably
> some attribute based syntax as XLIFF at unit and 
> lower levels does not behave as a tree) 
> and once this task is done, we can proceed with 
> specifying an API.

I think maybe we are making things too complicated.


--- Object Model vs API:

One can't really separate object model and API because specifying interfaces is the only way one can describe the relationship between the different parts of the object model.

In the case of XLIFF most of the 'model' is already largely defined anyway: the nested structure from <xliff> down to <unit> cannot be really represented many different ways. And we have a set of attributes/elements attached to that.

I think things get more complicated inside the unit. For example, one aspect of the XML serialization that may be implemented different ways is the segment representation. This is where the API helps by offering a way to hide the implementation: as long as one can access the data using the same methods, we don't really care how a given implementation does it.

But then, at some point, we reach the low-level objects, things like the content, the inline tags, the state values, etc. basically the returns and parameters of the interfaces that are not other interfaces.

For those we will need to define more specific details and even, to some degree, make implementation choices.

For example, the content of a <source> element: It can be implemented many different ways. Essentially it's a collection of 'text' parts and 'tag' parts. But to be able to access them we need to make some choices.

-a) It could be represented as a list of two different types of objects.

-b) It could be represented as a string of plain text with a separate list of tag objects that have offset fields telling where in the plain text the tag goes.

-c) It could be represented as a string with special characters used as anchors to point to corresponding tags in a separate list.

-d) etc.

At that point playing with abstractions has to yield to doing concrete things: How can we do a regular-expression search on the text parts of a <source>? How can we change the content to all upper-cases to perform some comparison actions? 

I don't think we want to 'abstract' those things. In fact, in the case of the regex use case we cannot really have an abstract content.find(pattern) methods because there is no real regular-expression standard covered by all programming languages at this point.

So since we cannot make available the functions for the content, we have to make available the content to the functions.

And that, in turns, means working with basic object types like String, int, etc. which means the methods we can use will be different depending on which of the representations a), b) or c) we choose.

For a): uppercasing the text would probably be done with something like this:

for ( int i=0; i<content.getTextPartCount() ) {
   String text = content.getTextPart(i);
   content.setTextPart(i, text.toUppercase());
}

For b) and c): it would probably be done with something like this:

content.setText(content.getText().toUppercase()):

It becomes quickly clear that a) may not be as efficient as the b) or c). We don't want to have loops every single time we access the text as a whole.

Then b) depends on offsets and uppercasing the text may in some cases change the length of it ("Fußball"->"FUSSBALL" (yes, I know: Unicode 5.1 introduces the Capital Sharp S, but allow me the example)). So any change of the length of the text would require to update the offsets as well. Here too it's clear that choosing b) may face some implementation hurdles.

Then c) may have to deal with some other issues.

The bottom line is that for the inline content we need to see what access methods are needed and choose an object model based on how feasible it is to implement them.


--- Syntax to use for the definition

As for the syntax to define this OM/API: Since there is nothing that seems obvious, I don't think it's that important to choose one at this stage.

I'd rather see one or more real implementations in Java, C#, JavaScript and Python than work on some abstract syntax that has limitations we don't see until someone implements it.

We can always convert what we end up with to some given syntax later.


--- Starting things

Dave was initially asking for a starting point. I'll try to provide a tentative definition of the low-level object model based on what was done for the Okapi XLIFF2 library (so far).

Note that we should probably explore also several serialization for some objects.
Or maybe serialization for different sub-set of data of an object.
The reason for that is that Web services may benefit from different data depending on what they do, and a full-blown output of a segment object, for example, may not always be the best way to communicate data in some main use cases.
I'll try to come up with concrete examples for that too.


Cheers,
-yves


================================================
From: Dr. David Filip [mailto:David.Filip@ul.ie] 
Sent: Thursday, June 19, 2014 2:40 PM
To: Felix Sasaki
Cc: Dave Lewis; Yves Savourel; Alan Melby; Schnabel, Bryan S; Kevin O'Donnell; Dr. David Filip
Subject: Re: XLIFF API in WebIDL

Felix, thanks for this.
I'd say that we first need to define an object model in a format independent of a specific serialization (we need to go shopping for a syntax here, probably some attribute based syntax as XLIFF at unit and lower levels does not behave as a tree) and once this task is done, we can proceed with specifying an API.

From the organizational point of view, the object model can be done in XLIFF TC but we would need to find another home for the API, should it be normative.

Cheers
dF


Dr. David Filip
=======================
LRC | CNGL | LT-Web | CSIS
University of Limerick, Ireland
telephone: +353-6120-2781
cellphone: +353-86-0222-158 
facsimile: +353-6120-2734
http://www.cngl.ie/profile/?i=452
mailto: david.filip@ul.ie

On Tue, Jun 17, 2014 at 2:35 PM, Felix Sasaki <fsasaki@w3.org> wrote:
I got this feedback: if we want to define an API, webIDL in the CR version is the best choice. However, if we want to define a DOM representation Web IDL does not make sense - we rather need a JSON data format, probably plus prose.

Best,

Felix

Am 13.06.2014 um 14:05 schrieb Felix Sasaki <fsasaki@w3.org>:

> Hi Dave,
>
> thanks for the ping, I will check this out.
>
> Best,
>
> Felix
>
> Am 13.06.2014 um 13:54 schrieb Dave Lewis <dave.lewis@cs.tcd.ie>:
>
>> Hi Felix,
>>
>> Given Yves' concerns about WebIDL could we get some input from others in the W3C about the statu/maturity/tools around WebIDL.
>>
>> They seem to be using it a fair bit in the Web Application WG - perhaps Charles or someone there could give us some input - he raised the topic at MLW.
>>
>> cheers,
>> Dave
>>
>> On 06/06/2014 12:04, Yves Savourel wrote:
>>> Hi Dave,
>>>
>>> I think providing such description would be a good start.
>>>
>>> I'm a bit wary about WebIDL because it seems both in use and under construction:
>>>
>>> - There is a v1 Candidate recommendation that is more than 2 years old:
>>> http://www.w3.org/TR/WebIDL/
>>>
>>> - And some work v2 in progress that seems more recent, but a branch of the Candidate recommendation:
>>>
>>> But I suppose many other specification use it (http://www.w3.org/wiki/Web_IDL) and it has a checker
>>> (http://www.w3.org/2009/07/webidl-check) although I'm not sure which of the two drafts any of those are using.
>>>
>>> But I can give it a try.
>>> If anything, using an abstract description may help in defining a better model.
>>>
>>> Cheers,
>>> -ys
>>>
>>>
>>> -----Original Message-----
>>> From: Dave Lewis [mailto:dave.lewis@cs.tcd.ie]
>>> Sent: Thursday, June 5, 2014 8:12 AM
>>> To: Yves Savourel
>>> Cc: Alan Melby; Felix Sasaki; bryan.s.schnabel@tektronix.com; kevinod@microsoft.com
>>> Subject: XLIFF API in WebIDL
>>>
>>> Hi Yves,
>>> I was chatting this morning more to Kevin, Ryan, Bryan and Alan about the XLIFF API idea.
>>>
>>> One good step forward we thought could be for you to map some of your API's (perhaps from the javadoc) into WebIDL as a starting
>>> point.
>>>
>>> It could be a quick way to have a spec that can be mapped straight into javascript, .NET etc to drive implementation. That way we
>>> are starting with your reference implementation and can then thrash out concensus on the spec in WebIDL  in comparison to other
>>> possible implementations.
>>>
>>> What do you think?
>>> cheers,
>>> Dave
>>>
>>>
>>>
>>
>
Follow-Ups:
- RE: XLIFF API in WebIDL
  - From: Ryan King <ryanki@microsoft.com>