Subject: RE: [xliff] RE: 2.0 Binary Data Module Proposal
Hi Yves, yes, you are correct about the state attribute, my mistake. I remembered it wrong, and that is why it does need to be added to the spec J.
The example was really to show how the target XLIFF would look once the source dialog was localized, so the binary data is more than just reference. A binary editor would not only need to know how to read, but also write to the binary based on the mime-type. As for determining if a binary should be edited or not, we could follow suit with unit and segment and add an optional translate=”yes|no” attribute.
First: Just a note looking at the example (un-related to the module)
You put the state attribute in <unit>, while it should be in ,segment> per the face-2-face agreement (https://lists.oasis-open.org/archives/xliff/201210/msg00094.html)
It seems those changes in the non-inline parts are not yet in the schema/spec.
Shirley: that should also be the case for the match type. It seems none of the F2F changes have been reflected yet.
Ryan: now a comment of the binary module.
It seems some (at least the first) binary objects are really provided as references, as opposed to resources that need to be modified. I was wondering if there should be a distinction between binary data to edit (like some image) vs binary data as reference.
Thanks Fredrik for your suggestions on a binary module. Since Microsoft is both a content provider and a tool implementer dealing in huge amounts of various types of data, this module is very important to our business model. SharePoint’s implementation of supporting file-level binaries only scratches the surface of how it would be implemented. We want to take it to the next level in 2.0 so that we can provide all possible content in XLIFF to our suppliers and provide tools (and allow suppliers to provide tools) that will properly consume binary data, which a good portion of our content contains. Take the following source and target dialogs for example (also attached):
We need to carry all of the information needed to recreate the localized dialog, not just textual data. You’ll see here that not only two strings have been localized, but also the dialog size and two controls contained in the dialog have been localized (resized in this case): the label for “Please select a configuration…” and the drop-down box associated with it. Additionally, we might want to carry around a screenshot as reference for the translator. So, here is an example of how that XLIFF might look with a binary module:
<?xml version="1.0" encoding="UTF-8"?>
<xliff version="2.0" srcLAng="en-US" tgtLang="de-DE" xmlns:bin="urn:oasis:names:tc:xliff:binary:2.0">
<file id="158" original="example.exe">
<!-- external binary reference -->
<bin:binary id="0" mime-type="image/jpeg">
<bin:source href="" />
<bin:target href="" />
<unit id="158" name="5" state="initial">
<source>Load Registry Config</source>
<target>Load Registry Config</target>
<!-- target text was not translated, but dialog size was increased -->
<bin:binary id="158" state="translated" mime-type="windows-resource-dialog">
<unit id="1" name="128;WIN_DLG_CTRL_" state="initial">
<!-- Neither target text nor control size were localized -->
<bin:binary id="1" state="initial" mime-type="windows-resource-control-button">
<unit id="2" name="128;WIN_DLG_CTRL_" state="translated">
<!-- target text was translated, but control size was not increased -->
<bin:binary id="2" state="initial" mime-type="windows-resource-control-button">
<unit id="1060" name="130;WIN_DLG_CTRL_" state="translated">
<source>Please select a configuration to load from the Registry</source>
<target>!!!!!!Please select a configuration to load from the Registry:!!!!!!</target>
<!-- both target text and control size were localized -->
<bin:binary id="1060" state="translated" mime-type="windows-resource-control-static-text">
<unit id="1021" name="133;WIN_DLG_CTRL_" state="initial">
<!-- target text was not translated, but control size was increased -->
<bin:binary id="1021" state="translated" mime-type="windows-resource-control-combo-box">
One thing in particular to note: Because dialog controls are often hierarchical, representing them as such in XLIFF would be important so the above example shows a <group> containing both <unit> and <group> elements as siblings. Top-level group <group id=”158”> contains a unit <unit id=”158”> which contains the dialog title and binary data, but <group id=”158”> also other groups, which contain the dialog control text and binary data. We’re not 100% sure, but we believe it would be a change to the spec to allow both a <unit> and a <group> to be defined as siblings under another group.
Hi Ryan, All,
The ballot for this as a feature was already passed, but I’d still like to make some comments and proposals on the implementation.
I personally do not believe that binary data in XLIFF is a good idea, but I do respect the decision of the majority. My concern is that this reverse the expectations that I feel is core to the XLIFF spirit. In my opinion the core idea is that XLIFF should enable tools from multiple creators to facilitate translation of content regardless of what tool was used to create the XLIFF file or what format the actual source document has. This is, at its most basic level, achieved by an initial tool that understand the source format transforming it or extracting translatable content into an XLIFF file. The file can then be further processed (usually translated) by other tools independent of the initial tool and source format. At the end of the processing chain the file is returned to the initial tool (or closely related tool) which create a localized version of the source file. By storing the source file in binary format within the XLIFF document the model is turned around. Now you have an initial tool that has no knowledge of the source format and depend on source format knowledge in the processing chain to get any meaningful work done. This use case would be better served by a translation package format leaving XLIFF to the actual translation of text.
Regarding the concrete proposal I have a few ideas on how to improve it. Binary data will in most cases not be suitable for direct processing by translators, instead it will need a separate extraction step before translation. So I think it would be good to simplify the task of leaving the binary portions out of the file for parts of the processing.
If the <bin-unit> is changed to a <bin-file> and made a sibling of <file> it would be a long step in that direction. Different <file>s in an XLIFF document are largely independent and merging and splitting documents at this level is common. Having the binary data as units possibly mixed into the sequence of text units would probably make it ambiguous if they can safely be removed and the content still be valid. In addition an empty <bin-unit> or other placeholder would have to be left so that the content can easily be re-inserted in the right place. It would also keep the binary data out of the path of other modules such as validation and string length restrictions.
Storing units smaller than files as binary data would make interoperability even harder so I do not think adding <bin-file> in addition to <bin-unit> would be a good idea. I’d prefer just <bin-file>. If the textual / markup portion of the data refer to external binary data some form of reference mechanism might be useful. But I do not see this as a requirement. If it is added I think the option of having the reference point form the <bin-file> to the <unit> it is associate with instead of the other way around would be good. This means that tools that do not make use of binaries would never encounter the reference directly.
Besides a mime type I think an original filename would be very helpful as the file extension is still the most common way to differentiate between formats when dealing with files. And I anticipate that most implementations would save the contents of the binary node into a file and process it in one or more separate steps. No application will or even could have a complete mapping of all mime-types to extensions.