[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: RE: 2.0 Binary Data Module Proposal
Since there’s been a lot of discussion on the list about the use case for binary data in XLIFF, I’m resurrecting this thread. We feel it important for a localization interchange standard to handle localizable data extracted from all types of content: documentation, web, UI, etc. Interchange of localizable binary data is essential for UI localization. A source dialog may not only need to have its text localized, but potentially the size, positioning, and directionality of the control that contains it. Please take a look at the mail below for an earlier response to Fredrik regarding the Binary Module for additional details.
I have also attached our draft proposal for the Binary Module to this mail for your further review.
Thanks Fredrik for your suggestions on a binary module. Since Microsoft is both a content provider and a tool implementer dealing in huge amounts of various types of data, this module is very important to our business model. SharePoint’s implementation of supporting file-level binaries only scratches the surface of how it would be implemented. We want to take it to the next level in 2.0 so that we can provide all possible content in XLIFF to our suppliers and provide tools (and allow suppliers to provide tools) that will properly consume binary data, which a good portion of our content contains. Take the following source and target dialogs for example (also attached):
We need to carry all of the information needed to recreate the localized dialog, not just textual data. You’ll see here that not only two strings have been localized, but also the dialog size and two controls contained in the dialog have been localized (resized in this case): the label for “Please select a configuration…” and the drop-down box associated with it. Additionally, we might want to carry around a screenshot as reference for the translator. So, here is an example of how that XLIFF might look with a binary module:
[ryanki] Original example removed. Please see the attached draft proposal for example implementation.
One thing in particular to note: Because dialog controls are often hierarchical, representing them as such in XLIFF would be important so the above example shows a <group> containing both <unit> and <group> elements as siblings. Top-level group <group id=”158”> contains a unit <unit id=”158”> which contains the dialog title and binary data, but <group id=”158”> also other groups, which contain the dialog control text and binary data. We’re not 100% sure, but we believe it would be a change to the spec to allow both a <unit> and a <group> to be defined as siblings under another group.
Hi Ryan, All,
The ballot for this as a feature was already passed, but I’d still like to make some comments and proposals on the implementation.
I personally do not believe that binary data in XLIFF is a good idea, but I do respect the decision of the majority. My concern is that this reverse the expectations that I feel is core to the XLIFF spirit. In my opinion the core idea is that XLIFF should enable tools from multiple creators to facilitate translation of content regardless of what tool was used to create the XLIFF file or what format the actual source document has. This is, at its most basic level, achieved by an initial tool that understand the source format transforming it or extracting translatable content into an XLIFF file. The file can then be further processed (usually translated) by other tools independent of the initial tool and source format. At the end of the processing chain the file is returned to the initial tool (or closely related tool) which create a localized version of the source file. By storing the source file in binary format within the XLIFF document the model is turned around. Now you have an initial tool that has no knowledge of the source format and depend on source format knowledge in the processing chain to get any meaningful work done. This use case would be better served by a translation package format leaving XLIFF to the actual translation of text.
Regarding the concrete proposal I have a few ideas on how to improve it. Binary data will in most cases not be suitable for direct processing by translators, instead it will need a separate extraction step before translation. So I think it would be good to simplify the task of leaving the binary portions out of the file for parts of the processing.
If the <bin-unit> is changed to a <bin-file> and made a sibling of <file> it would be a long step in that direction. Different <file>s in an XLIFF document are largely independent and merging and splitting documents at this level is common. Having the binary data as units possibly mixed into the sequence of text units would probably make it ambiguous if they can safely be removed and the content still be valid. In addition an empty <bin-unit> or other placeholder would have to be left so that the content can easily be re-inserted in the right place. It would also keep the binary data out of the path of other modules such as validation and string length restrictions.
Storing units smaller than files as binary data would make interoperability even harder so I do not think adding <bin-file> in addition to <bin-unit> would be a good idea. I’d prefer just <bin-file>. If the textual / markup portion of the data refer to external binary data some form of reference mechanism might be useful. But I do not see this as a requirement. If it is added I think the option of having the reference point form the <bin-file> to the <unit> it is associate with instead of the other way around would be good. This means that tools that do not make use of binaries would never encounter the reference directly.
Besides a mime type I think an original filename would be very helpful as the file extension is still the most common way to differentiate between formats when dealing with files. And I anticipate that most implementations would save the contents of the binary node into a file and process it in one or more separate steps. No application will or even could have a complete mapping of all mime-types to extensions.
Description: Binary Module.pdf