xliff message

Subject: [xliff] Consolidation - Embedded XLIFF

From: Yves Savourel <ysavourel@translate.com>
To: xliff@lists.oasis-open.org
Date: Mon, 03 Jun 2002 09:50:15 -0600

Following up on the action item of last meeting, here is the consolidated
information on the topic "Embedded XLIFF".

-yves

Title: Embedded XLIFF

Embedded XLIFF

Last update: Jun-03-2002

1. Status After Meeting
2. Discussion
    2.1. Gérard's Proposal
3. Additional Information
    3.1. Embedding a Full XLIFF Document
    3.2. Embedding One or More XLIFF Constructs
    3.3. Embedding Localization Directives

1. Status After Meeting

Agreed to on principle, need to be worked on.

2. Discussion

2.1. Gérard's Proposal

The original proposition came from Gérard (see email: http://lists.oasis-open.org/archives/xliff/200205/msg00004.html). The text and the example files are reproduced below:

Microsoft recommended proposal for XLIFF 1.1: make XLIFF tags a namespace that can also be used in software and content data files. This proposal is an add-on to Eric's proposal. The idea is to make XLIFF an open model XML schema so that it can carry localization data and also the complete xml-based source file as context.
Think of an xml based help file or web page that has its own schema that the localization process has no knowledge about. We call it the user domain schema. After the source authoring is done, the xml file is handed over to localization engineers/process to add localization directives.
Currently, most processes need to extract resources into a canonical form, such as a resource table. XLIFF is the export format of that form. The problem with this is that the context information for the resources is lost. It is especially the case for content localization.
Imagine that we keep the source file as the main vehicle for transforming data. We insert localization directives directly into the source file but in XLIFF namespace. (It is technically possible to make the same document be valid for both the user schema and XLIFF schema.) Then, we can transfer the VERY SAME SOURCE FILE to the localizer without losing any context information. This way, a localizer can view the resources in two views:
1) The original source file view and
2) the table view for all trans units.
This way, content localization and software localization can be the same as long as they both are xml based files. I rewrite the sample-xliff.xml file to illustrate the idea. The first file is dialog.xml, which is an imagined windows resource xml based format. Think of it as the XML based RC file. The second file is the mixed content from both the user schema and XLIFF schema. You can see the clear separation of information from two schemas. An XLIFF element is mostly a child element to a user domain element to add localization directives. It can have references to the user domain data such as to tell XLIFF where the source text is located in the string table.

Dialog.xml (no XLIFF markup):

<?xml version="1.0" encoding="UTF-8"?>
<WindowsResource xmlns:urn="urn:b"
 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
 xsi:schemaLocation="resource.xsd">
 <Dialog name="foo" help-id="17" x="0" y="0" cx="100" cy="100">
  <string id="id_102">Hello World!</string>
  <image mime-type="image/jpg" md5-checksum="0123456789abcdef">
   012345abcdefghijklmnopqrstuvwxyz012345abcdefghijklmnopqrs
  </image>
 </Dialog>
</WindowsResource>

Dialog.xml with embedded XLIFF markup (Note that Gérard's example was done using the syntax from Eric's 1.1 proposal):

<?xml version="1.0" encoding="UTF-8"?>
<WindowsResource
 xmlns:urn="urn:b" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
 xsi:schemaLocation="resource-xliff.xsd"
 xmlns:xlf="xlf" 
 version="1.1" language="en-us" >
 <!-- the new schema above imports the xliff schema as secondary schema -->		
 <xlf:header uri="http://www.xmlspy.com"
  md5-checksum="A9FD64E12C"
  content-creation-tool="XEmacs"
  content-creation-tool-version="21.4"
  content-creation-date="2002-04-23"
  category="General"
  mime-type="text/html"
  source-language="en-us" 
  target-language="fr-fr"
  product-name="String"
  product-version="String"
  build-num="String">
  <xlf:instructions language="en-us" priority="1">
   <xlf:html-content xmlns:h="http://www.w3.org/1999/xhtml">
    Translate <h:b>this</h:b> document.
   </xlf:html-content>
  </xlf:instructions>
  <xlf:comment language="en-us">
   <xlf:author name="Some User" email="user@company.com"
    phone="888 555 1212"/>
   <xlf:text-content mime-type="text/plain">
    Here's a simple text comment
   </xlf:text-content>
  </xlf:comment>
  <xlf:style-guide language="en-us">
   <xlf:contact name="Andrew Other" email="another@place.com"
    phone="888 555 1212"/> 
   <xlf:link href=""my_style.doc"" mime-type="application/msword"/>
  </xlf:style-guide>
 </xlf:header>
 <Dialog name="foo" help-id="17" x="0" y="0" cx="100" cy="100">
  <xlf:group id="id_101">
   <xlf:comment>
    <xlf:text-content>Top level group</xlf:text-content>
   </xlf:comment>
  </xlf:group>
  <xlf:resource-context type="dialog"></xlf:resource-context>
  <string id="id_102"> 
   <xlf:trans-unit id="id_102" sourceRef="text">
    <xlf:alt-trans match-quality="98">
     <xlf:target>Salut Monde!</xlf:target>
    </xlf:alt-trans>
    <xlf:alt-trans match-quality="medium">
     <xlf:target>Bonjour Monde!</xlf:target>
    </xlf:alt-trans>
   </xlf:trans-unit>
     Hello <xlf:bpt id="id_104" ref="id_105">&lt;b&gt;</xlf:bpt>World!
     <xlf:ept id="id_105" ref="id_104">&lt;/b&gt;</xlf:ept>
  </string>
  <image mime-type="image/jpg" md5-checksum="0123456789abcdef">
   <xlf:bin-unit id="id_103" sourceRef="text" />
   012345abcdefghijklmnopqrstuvwxyz012345abcdefghijklmnopqrs
  </image>
 </Dialog>
</WindowsResource>

For those less familiar with Eric's proposed version, the following example (not included in the original examples) is a similar file but using embedded XLIFF constructs that follow more closely the current notation:

<?xml version="1.0" encoding="UTF-8"?>
<WindowsResource xmlns:urn="urn:b"
 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
 xsi:schemaLocation="resource.xsd"
 xlf:"urn:oasis:names:tc:xliff:document:embedded:1.0">
 <xlf:header>
  <xlf:note>Translate this document.</xlf:note>
  <xlf:reference>
   <external-file href=""my_style.doc"/>"
  </xlf:reference>
 </xlf:header>
 <Dialog name="foo" help-id="17" x="0" y="0" cx="100" cy="100">
  <string id="id_102">
   <text>Hello World!</text>
   <xlf:alt-trans match-quality="98">
    <xlf:target xml:lang="fr">Salut Monde!</xlf:target>
   </xlf:alt-trans>
   <xlf:alt-trans match-quality="medium">
    <xlf:target xml:lang="fr-CH">Bonjour Monde!</xlf:target>
   </xlf:alt-trans>
  </string>
  <image mime-type="image/jpg" md5-checksum="0123456789abcdef">
   012345abcdefghijklmnopqrstuvwxyz012345abcdefghijklmnopqrs
  </image>
 </Dialog>
</WindowsResource>

3. Additional Information

The meeting participants felt that providing some way to embed XLIFF-related data in other formats would be very beneficial. This can be broken down into three main types of use:

3.1. Embedding a Full XLIFF Document

Scenario: The other XML format embed directly a full XLIFF document.

This is already possible because, in this case, the burden of allowing such construct falls completely on the other format. There is no special provision needed in XLIFF to allow this.

3.2. Embedding One or More XLIFF Constructs

Scenario: This is the example given by Gérard. Here, "chunks" of XLIFF data are inserted in specific location of the other format.

This will require to define a schema (different from the XLIFF schema) that other formats can import in their own schemas. Such schema will provide sets of XLIFF elements and attributes declared and structured in a way they can be embedded in other constructs.

3.3. Embedding Localization Directives

Scenario: This case is related to the previous one, but with a much finer granularity. The localization directives may not necessarily, in some cases, match directly XLIFF elements or attributes but they allow tools to map information to XLIFF constructs.

Localization directives are isolated elements or attributes in an arbitrary XML document that indicate properties related to localization. Such properties can be mapped in XLIFF construct. For example: maxbytes. In addition, some of the directives may not exist as elements or attribute of XLIFF itself, but may be need to create XLIFF.

We need to define a set of requirements, then a vocabulary. This vocabulary will most likely be a subset of XLIFF with a few additional constructs.

Localization directives are part of some work started under the ITS group, and also proposed as a possible activity for the W3C internationalization working group (under the topic of "localizability").