1.2 to 2.0 Gaps and Proposals

As part of our exercise to map our 1.2 implementation (and challenges) to 2.0, we discovered the following gaps in core and modules that we would like to propose 5 features for. These are all based on real-world use cases at Microsoft and quite probably apply to other large companies that outsource content for localization.

Proposal 1: Add an optional build attribute to 2.0 <file> element in core.

In 1.2, the build-num attribute is important for us because, once we’ve handed off files to our suppliers to be localized, we expect localized files from the same build to be returned. We suspect we aren’t the only content providers doing this kind of validation. In 2.0, there is no file-level attribute we could use for this.

Proposal 2: Be able to specify optional custom values for match type attribute in the <mtc:matches> module.

Content providers and Localization Suppliers base their cost and billing models on match similarity and match types. Localization suppliers charge us differently for ICE Matches, Exact Matches, and Fuzzy Matches, and we might even want to get more granular than that as our cost and billing models evolve with the business. In 2.0, the match type doesn’t support the values exact-match and fuzzy-match,which were defined in the state-qualifier attribute in 1.2. Instead of supporting these two, or any others that may not have migrated from 1.2 to 2.0, as a separate attribute, the request is, that like the discussion on state and sub-state in the Face-to-Face in Seattle, we add a sub-type to match type. This will allow us to add extra business logic to types, such as "tm" or "mt", which are already defined in the spec.

Where, as noted in the 2.0 spec: The sub-category prefix is a string uniquely identifying a collection of values for a specific authority. The value is any string value defined by an authority. The prefix xlf is reserved for this specification…

Proposal 3: Add an optional Reference Language to core.

This is a crucial feature for Microsoft and other large companies that localize minority languages. For example, it is typical that when we localize from English into Quechua, localizers are more efficient and provide much higher quality translation, when along with English source, we provide them with Spanish target. In 1.2, Reference Languages could be defined in an <alt-trans> element:

<alt-trans>
<target xml:lang="es-es“
alttranstype="reference">hola mundo %s</target>
</alt-trans>

There is no equivalent in 2.0, so we’d like to make this much simpler by proposing an optional <reference> element on <segment> that can have an xml:lang attribute different from source and target in the main document.

<source xml:lang=”en-us”>hello world</source>
<target xml:lang=”quz-pe">hola món</target>

<reference xml:lang=”es-es“>hola mundo</reference>

</segment>

Proposal 4: Add an optional name attribute on <notes> in core and <mds:metadata> module.

We believe it will be typical for content providers to want to group their notes or metadata in meaningful ways. This might be done so that a certain number of notes or bits of metadata can be processed in the same way, or simply grouped and displayed together, such as in an editor UI. Here are some examples:

<notes name="comments">
<note id=“comment">This string cannot be longer than 100 characters</note>
<note id=“user">Developer@microsoft.com</note>
<note id=“date">10/21/2012 5:28:13 PM</note>
</notes>

<notes name="instructions">
<note id=“instruction">Do not localize the product name</note>
<note id=“user">loc_engineer@microsoft.com</note>
<note id=“date">10/21/2012 5:28:13 PM</note>
</notes>

As opposed to something less structured and more difficult to process:

<notes>
<note id=“instruction">Do not localize the product name</note>
<note id=“instruction-user">Localization Engineer</note>
<note id=“instruction-date">10/21/2012 5:28:13 PM</note>

<note id=“comment">This string cannot be longer than 100 characters</note>
<note id=“comment-user">Developer</note>
<note id=“comment-date">10/21/2012 5:28:13 PM</note>
</notes>

Similarly, we’d like a name attribute for <mda:metadata>.

<mda:metadata name=”properties”>
   <meta type="previous-source">hello world</meta>
   <meta type="string-category">TextBox</meta>
   <meta type="workorder-id">25</meta>
   <meta type="workorder-name">Hotmail</meta>
</mda:metadata>

Proposal 5: Add optional change tracking attributes to <segment>.

When translation work may be shared across <segments> in the same file, for whatever reason, it is useful to track who modified a <segment> and when it was modified for billing purposes. This can be easily done when localization is done online in a database, but once it is offline and file-based, e.g. in an XLIFF file, having optional attributes defined on the <segment> would aid in capturing this information.

<source>hello world</source>
<target>hola món</target>

</segment>

Please let us know your opinions on these proposals.

Thanks,

Microsoft Corporation

(Ryan King, Kevin O'Donnell, Uwe Stahlschmidt, Alan Michael)

xliff message