OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

xliff message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]

Subject: Re: [xliff] RE: 2.0 Binary Data Module Proposal

@Everyone, in case you have input for this feature, please discuss it in this thread.
Ryan, thanks for being transparent about MS plans with the binary module.
Apparently, although this feature has been voted through on the feature freeze meeting, it seems that many are concerned with this module.
@Bryan, would you please make this feature an agenda point on Tuesday?

While, I do not think we should be ever revisiting ballots that were duly conducted according to TC process. It seems to me that this feature is in need of prime time discussion to make clear what various stakeholders don't like and give Ryan and Microsoft an opportunity to develop a feature that will not be killed upfront while entering the Committee Draft approval process.

Now to the technical aspects:
I have voted for this module and I reiterate LRC's suport and plans to make a reference implementation.

However, as most modules, this module was not specified at the point of the feature approval (and still has not been specified).

The use case I envisioned and the one that is inline with LRC's service oriented achitecture platform is solely bringing a file that could not be parsed for various reasons at the point of the content extraction to a point where it could be extracted in a second run by an arbitrary state-of-the-art parsing service.

The use case in SharePoint is clear, while it is easy to extract SharePoint content fields (and even most of the UI elements) to XLIFF core elements in the spirit of the standard, parsing content libraries that can contain anything is beyond the scope of XLIFF implementation in SharePoint and in most Document/Content Management systems.

I agree with Joachim and others that it must be clear (and made clear if it isn't) that extraction of content as a binary element is not the same as full fledged extraction into XLIFF core elements. No one says it is AFAIK.
And I do not agree that by including a binary data module XLIFF is breaking any promises.
1) Bin-unit was in 1.2
2) There is a clear use case
3) Unlike in 1.2 the binary data is proposed as a module and hence clearly separated from the core functionality
[Modules are part of the spec and the only TC warrented way how to address their functionality, but their support is optional if you do not want to process them and ergo support the functionality covered by them.]

Opinions will vary what is worse, badly formed html snippets in cdata that appear illegally in XLIFF trans-units (status quo) or having unparsed content clearly separated (and base64 encoded) in a binary module.
My firm opinion is that the former is worse and that limited bin-file handling functionality in XLIFF 2.0 can help with the remedy.

So while I support the binary module, I support a limited transport capability along the lines of the SharePoint demo given in Seattle and as implemented currently in SOLAS at LRC .
[Initial XLIFF file contains a base-64 encoded original file (actually as in internal file reference rather than bin-unit, but having a bin-file would be more handy) until the content is extracted into core elements by a tikal webservice. SOLAS is NOT using XLIFF as a processing format, it is using it as a SOA/ESB message and it is up to any of the (loosely, RESTfully) integrated services whether they will use XLIFF only as a message format or will also directly process it].

I do not support binary data all over the place and I do not support binary dialogs and other UI elements as a regular payload that should make it though the whole roundrtrip. This feels like 90's and I know that Microsoft has up to date XML based methods for automatically resizing UI lelements, plus if there are restrictions needed, the length/storage restriction module (as being worked on by Fredrik) can be used.
Other issue with the envisioned extensive usage of binary payload is that this kind of functionality crosses the fine line between a module and a tool specific extension. [Similar issue that Yves pointed out with the validation module (will discuss in the appropriate thread),] It would be impossible to specify a standard, IMPLEMENTATION INDEPENDENT (definitory requirement for any standard) behavior and would be well out of scope of XLIFF TC as a TC dealing with an XML interchange format.

Thanks for your attention and talk to you on Tuesday

Dr. David Filip
University of Limerick, Ireland
telephone: +353-6120-2781
cellphone: +353-86-0222-158
facsimile: +353-6120-2734
mailto: david.filip@ul.ie

On Tue, Dec 4, 2012 at 4:15 PM, Ryan King <ryanki@microsoft.com> wrote:

Hi Yves, yes, you are correct about the state attribute, my mistake. I remembered it wrong, and that is why it does need to be added to the spec J.


The example was really to show how the target XLIFF would look once the source dialog was localized, so the binary data is more than just reference. A binary editor would not only need to know how to read, but also write to the binary based on the mime-type. As for determining if a binary should be edited or not, we could follow suit with unit and segment and add an optional translate=”yes|no”  attribute.





From: xliff@lists.oasis-open.org [mailto:xliff@lists.oasis-open.org] On Behalf Of Yves Savourel
Sent: Tuesday, December 4, 2012 7:07 AM
To: xliff@lists.oasis-open.org
Subject: RE: [xliff] RE: 2.0 Binary Data Module Proposal


Hi Ryan,


First: Just a note looking at the example (un-related to the module)


You put the state attribute in <unit>, while it should be in ,segment> per the face-2-face agreement (https://lists.oasis-open.org/archives/xliff/201210/msg00094.html)


It seems those changes in the non-inline parts are not yet in the schema/spec.


Shirley: that should also be the case for the match type. It seems none of the F2F changes have been reflected yet.



Ryan: now a comment of the binary module.


It seems some (at least the first) binary objects are really provided as references, as opposed to resources that need to be modified. I was wondering if there should be a distinction between binary data to edit (like some image) vs binary data as reference.






From: xliff@lists.oasis-open.org [mailto:xliff@lists.oasis-open.org] On Behalf Of Ryan King
Sent: Monday, December 03, 2012 1:55 PM
To: Estreen, Fredrik; xliff@lists.oasis-open.org
Subject: [xliff] RE: 2.0 Binary Data Module Proposal


Thanks Fredrik for your suggestions on a binary module. Since Microsoft is both a content provider and a tool implementer dealing in huge amounts of various types of data, this module is very important to our business model. SharePoint’s implementation of supporting file-level binaries only scratches the surface of how it would be implemented. We want to take it to the next level in 2.0 so that we can provide all possible content in XLIFF to our suppliers and provide tools (and allow suppliers to provide tools) that will properly consume binary data, which a good portion of our content contains. Take the following source and target dialogs for example (also attached):



We need to carry all of the information needed to recreate the localized dialog, not just textual data. You’ll see here that not only two strings have been localized, but also the dialog size and two controls contained in the dialog have been localized (resized in this case): the label for “Please select a configuration…” and the drop-down box associated with it. Additionally, we might want to carry around a screenshot as reference for the translator. So, here is an example of how that XLIFF might look with a binary module:


<?xml version="1.0" encoding="UTF-8"?>

<xliff version="2.0" srcLAng="en-US" tgtLang="de-DE" xmlns:bin="urn:oasis:names:tc:xliff:binary:2.0"> 

  <file id="158" original="example.exe">

    <!-- external binary reference -->

    <bin:binary id="0" mime-type="image/jpeg">

      <bin:source href="" />

     <bin:target href="" />


    <group id="158">

      <unit id="158" name="5" state="initial">

        <segment id="158">

          <source>Load Registry Config</source>

          <target>Load Registry Config</target>


       <!-- target text was not translated, but dialog size was increased -->

        <bin:binary id="158" state="translated" mime-type="windows-resource-dialog">





      <group id="1">

        <unit id="1" name="128;WIN_DLG_CTRL_" state="initial">

          <segment id="1">




         <!-- Neither target text nor control size were localized -->

          <bin:binary id="1" state="initial" mime-type="windows-resource-control-button">

             <bin:source form="base64"><![CDATA[AQAAAAAAAAAAAAEAAVBIAC8AMgAOAAIAAAAAAAABgAAAAA==]]></bin:source>

             <bin:target form="base64"><![CDATA[AQAAAAAAAAAAAAEAAVBIAC8AMgAOAAIAAAAAAAABgAAAAA==]]></bin:target>




      <group id="2">

        <unit id="2" name="128;WIN_DLG_CTRL_" state="translated">

         <segment id="2">




         <!-- target text was translated, but control size was not increased -->

          <bin:binary id="2" state="initial" mime-type="windows-resource-control-button">

            <bin:source form="base64"><![CDATA[AQAAAAAAAAAAAAAAAVCBAC8AMgAOAAMAAAAAAAABgAAAAA==]]></bin:source>

            <bin:target form="base64"><![CDATA[AQAAAAAAAAAAAAAAAVCBAC8AMgAOAAMAAAAAAAABgAAAAA==]]></bin:target>




      <group id="1060">

        <unit id="1060" name="130;WIN_DLG_CTRL_" state="translated">

          <segment id="1060">

            <source>Please select a configuration to load from the Registry</source>

            <target>!!!!!!Please select a configuration to load from the Registry:!!!!!!</target>


         <!-- both target text and control size were localized -->

          <bin:binary id="1060" state="translated" mime-type="windows-resource-control-static-text">

            <bin:source form="base64"><![CDATA[AQAAAAAAAAAAAAAAAlAHAAcArAAIAAAAAAAAAAABggAAAA==]]></bin:source>

            <bin:target form="base64"><![CDATA[AQAAAAAAAAAAAAAAAlAOAAcAxAAIAAAAAAAAAAABggAAAA==]]></bin:target>




      <group id="1021">

        <unit id="1021" name="133;WIN_DLG_CTRL_" state="initial">

          <segment id="1021">

            <source form="base64"><![CDATA[]]></source>

            <target form="base64"><![CDATA[]]></target>


          <!-- target text was not translated, but control size was increased -->

          <bin:binary id="1021" state="translated" mime-type="windows-resource-control-combo-box">

            <bin:source form="base64"><![CDATA[AQAAAAAAAAAAAAMBIVAJABkAqgCfAAEAAAAAAAABhQAAAA==]]></bin:source>

            <bin:target form="base64"><![CDATA[AQAAAAAAAAAAAAMBIVAJABkAxACfAAEAAAAAAAABhQAAAA==]]></bin:target>








One thing in particular to note: Because dialog controls are often hierarchical, representing them as such in XLIFF would be important so the above example shows a <group> containing both <unit> and <group> elements as siblings. Top-level group <group id=”158”> contains a unit <unit id=”158”> which contains the dialog title and binary data, but <group id=”158”> also other groups, which contain the dialog control text and binary data. We’re not 100% sure, but we believe it would be a change to the spec to allow both a <unit> and a <group> to be defined as siblings under another group.





From: Estreen, Fredrik [mailto:Fredrik.Estreen@lionbridge.com]
Sent: Thursday, November 29, 2012 4:32 AM
To: Ryan King; xliff@lists.oasis-open.org
Subject: RE: 2.0 Binary Data Module Proposal


Hi Ryan, All,


The ballot for this as a feature was already passed, but I’d still like to make some comments and proposals on the implementation.


I personally do not believe that binary data in XLIFF is a good idea, but I do respect the decision of the majority. My concern is that this reverse the expectations that I feel is core to the XLIFF spirit. In my opinion the core idea is that XLIFF should enable tools from multiple creators to facilitate translation of content regardless of what tool was used to create the XLIFF file or what format the actual source document has. This is, at its most basic level, achieved by an initial tool that understand the source format transforming it or extracting translatable content into an XLIFF file. The file can then be further processed (usually translated) by other tools independent of the initial tool and source format. At the end of the processing chain the file is returned to the initial tool (or closely related tool) which create a localized version of the source file. By storing the source file in binary format within the XLIFF document the model is turned around. Now you have an initial tool that has no knowledge of the source format and depend on source format knowledge in the processing chain to get any meaningful work done. This use case would be better served by a translation package format leaving XLIFF to the actual translation of text.


Regarding the concrete proposal I have a few ideas on how to improve it. Binary data will in most cases not be suitable for direct processing by translators, instead it will need a separate extraction step before translation. So I think it would be good to simplify the task of leaving the binary portions out of the file for parts of the processing.


If the <bin-unit> is changed to a <bin-file> and made a sibling of <file> it would be a long step in that direction. Different <file>s in an XLIFF document are largely independent and merging and splitting documents at this level is common. Having the binary data as units possibly mixed into the sequence of text units would probably make it ambiguous if they can safely be removed and the content still be valid. In addition an empty <bin-unit> or other placeholder would have to be left so that the content can easily be re-inserted in the right place. It would also keep the binary data out of the path of other modules such as validation and string length restrictions.


Storing units smaller than files as binary data would make interoperability even harder so I do not think adding <bin-file> in addition to <bin-unit> would be a good idea. I’d prefer just <bin-file>. If the textual / markup portion of the data refer to external binary data some form of reference mechanism might be useful. But I do not see this as a requirement. If it is added I think the option of having the reference point form the <bin-file> to the <unit> it is associate with instead of the other way around would be good. This means that tools that do not make use of binaries would never encounter the reference directly.


Besides a mime type I think an original filename would be very helpful as the file extension is still the most common way to differentiate between formats when dealing with files. And I anticipate that most implementations would save the contents of the binary node into a file and process it in one or more separate steps. No application will or even could have a complete mapping of all mime-types to extensions.



Fredrik Estreen


From: xliff@lists.oasis-open.org [mailto:xliff@lists.oasis-open.org] On Behalf Of Ryan King
Sent: den 16 november 2012 01:02
To: xliff@lists.oasis-open.org
Subject: [xliff] 2.0 Binary Data Module Proposal


In anticipation of closing down on 2.0, we have two new proposals for modules. In this mail, we are proposing the second of the two, a Binary Data module.


For those who attended the XLIFF Symposium in Seattle, you were given the opportunity to see a *real world* implementation of the <bin-unit> element in SharePoint using XLIFF 1.2. Bryan Schnabel even advocated giving the SharePoint team an award J. Since there is no equivalent of the <bin-unit> in 2.0, this proposal is to add a Binary Data Module.


We think that the 2.0 implementation could be essentially the same as 1.2 with just the elements and attributes used *for now* so the we essentially get it on the 2.0 radar. We may want to propose additional features after we conduct some reviews with the SharePoint team over the next couple of weeks to get their feedback on any improvements they would like to see.


Here is SharePoint’s 1.2 implementation:


      <bin-unit id="fab82e10-02f0-4325-8390-bb10ec086bcc" mime-type="text/plain">


          <external-file href="" href="http://sphvm-33449/sites/pub/Translation%20Packages/redmond_makoscum/fr-fr-Documents-20121115T0733550000Z-0/fr-fr-Documents-0002.txt" target="_blank">http://sphvm-33449/sites/pub/Translation Packages/redmond_makoscum/fr-fr-Documents-20121115T0733550000Z-0/fr-fr-Documents-0002.txt" />



          <external-file href="" href="http://sphvm-33449/sites/pub/Translation%20Packages/redmond_makoscum/fr-fr-Documents-20121115T0733550000Z-0/fr-fr-Documents-0002.txt" target="_blank">http://sphvm-33449/sites/pub/Translation Packages/redmond_makoscum/fr-fr-Documents-20121115T0733550000Z-0/fr-fr-Documents-0002.txt" />





      <bin-unit id="fab82e10-02f0-4325-8390-bb10ec086bcc" mime-type="text/plain">


          <internal-file form="base64"><![CDATA[VGhpcyBpcyBhIHRlc3Qu]]></internal-file>



          <internal-file form="base64"><![CDATA[VGhpcyBpcyBhIHRlc3Qu]]></internal-file>




Please let us know your opinions on the proposal.



Microsoft Corporation

(Ryan King, Kevin O'Donnell, Uwe Stahlschmidt, Alan Michael)



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]