Eric D. Friedman
Colin Meldrum
Extensibility through namespaces
Processing instructions dropped
One content element per XLIFF document
Separate XLIFF-Manifest schema for describing inter-file relationships
Counts and other metrics moved to manifest space
Single document with reference elements sent on round trip through TEP
Customer sends an update to portions of a localization kit already in progress
A project participant interprets a manifest to optimize workflow
This document describes the design objectives, terminology, and key usage scenarios behind Convey SoftwareÆs proposal for the XLIFF 1.1 specification.á The changes proposed are classified as follows:á technical changes; domain model changes; and notational changes.á In this section, we provide an overview of some of the key aspects of the proposal, with hyperlinks to the motivating design objective(s).á Low-level design and implementation issues (including open issues) are addressed in the accompanying XML Schema documents [core manifest] and hence in the HTML documentation generated from them [core manifest].
Our proposal uses XML namespaces to designate specific extension points for including proprietary data in the XLIFF content model.á This enables instance documents to extend XLIFF at appropriate points while preserving the ability to validate XLIFF documents and to structure data in the proprietary namespace.á [20 24 25 27]
Since work on XLIFF 1.0 began, XML Schema emerged as an official W3C Recommendation.á Because Schema provides a richer, more powerful model for structuring XML content, we have implemented XLIFF as a schema instead of a DTD.á Because DTD validation does not work with XML namespaces, we have eliminated DTD support entirely. [5]
Our proposal removes the versioning constructs from XLIFF.á Instead, we recommend that the many tools dedicated to that problem handle version control and change management issues.á Eliminating ôversioning by accretionö solves several problems we observed in XLIFF 1.0.á (1) XLIFF consumers can use standard diff/merge utilities to compare files from different stages of the localization process.á (2) The XLIFF content model is greatly simplified.á (3) The size of XLIFF documents does not expand from revision to revision.á (4) XLIFF tools do not have to deduce which elements represent the current content image according to the workflow process model û this would otherwise be a major obstacle to interoperability. [18]
We recommend eliminating the requirement that XLIFF implementations respond to specific processing instructions for two reasons:á (1) there is strong sentiment in the XML community in support of removing processing instructions from the language altogether.á Hence, it would be prudent to seek alternatives so that XLIFF tools are not required to support what may shortly become a deprecated or obsolete feature of XML.á (2) XML processors cannot validate processing instructions, and so they impose demands on application programmers that can be avoided by using alternatives (namespaces). [16]
Many types of localizable content are not stored as files on a file system.á Consequently, we propose changing the XLIFF ôfileö element to the more generic ôcontent.ö [21]
We propose that each XLIFF document represent one and only one content element.á This restriction has the following positive consequences:á (1) XLIFF documents do not expand indefinitely in size; (2) strategic updates to a collection of XLIFF documents can be done without performing surgery on the entire document set; (3) when working on a single content element, XLIFF readers do not have to process every other content element in a file; (4) XLIFF writers do not have to emit every content element when committing a change to one element. [33]
The decision to localize a particular set of content elements together can be made for a variety of process-specific reasons.á For example, files may be grouped because they share a language pair, because they have common contextual features, or because statistical analysis reveals a degree of linguistic intersection that can be leveraged to reduce the cost of translation.á Reference materials are also often grouped together in related sets.á Those sets, in turn, are used or extended by individual content elements or sets.
Our proposal includes a separate schema for writing XLIFF-Manifests û documents that model these decisions as a richer set of relations than is possible with a simple enumeration of elements.á It is, of course, possible to represent a simple enumeration using XLIFF-Manifest (or alternatives, discussed below).á Figure 1 depicts the classes that make up a typical localization kit.á The proposed XLIFF-Manifest Schema provides an extensible implementation of the manifest abstraction, and includes defined insertion points for vendor- or process-specific metrics, such as word counts and other leveraging statistics. [32]
Figure 1 XLIFF Localization Kit Class Model
Metrics such as word count or segment count represent metadata gathered about content rather than properties of content.á The same content element may have different metrics associated with it at different points in the localization process, depending on the vendor or process or tool or tool version being used to work on the content.á In some cases û repetition counting, for example û count metrics are only meaningful across context-dependent aggregations of content (i.e. content sets identified by some tool, for some process).á
Contractually, a metric is a piece of metadata that a content producer provides to a content consumer to describe the content.á As such, the count metric is an artifact of a particular transaction and so is ideally suited for inclusion in a manifest.
In practice, tools that need to display metrics on a large set of XLIFF document will be easier to implement and vastly more efficient if those metrics can be accessed without having to parse each XLIFF document individually.
Multiple documents in a localization kit have reached the edit stage when an update to the kit is received from the customer.
The localization kit contains a bag of XLIFF documents of unspecified relatedness.
Content element û an abstraction of localizable content.á Content elements may be drawn from file systems (files, documents), databases (rows, tables), or from a content management system.
Content set û a collection of content elements grouped together because they share a common context or reference element, or because a content analysis process reveals a high level of segment overlap.
Localization kit û an abstract collection of content sets together with the relevant reference sets and metrics about the content sets and/or individual content elements.á Localization processes and workflow systems may define concrete kit instances of various kinds.á For simple interoperability scenarios, a zip file or a multipart MIME document may be appropriate.á For more complex scenarios, a message oriented middleware solution may initiate a transaction (push or pull) whose duration defines the scope of the kit.
Manifest û an abstract description of relationships between content sets and reference sets present in a localization kit.á In simple scenarios where the localization kit is a zip file, the manifest may be absent, indicating that there are no explicit relationships between the documents other than the reference content pointers contained within the content elements.á For more complex scenarios, an XLIFF producer may use an XLIFF manifest to indicate that specific content sets can be most efficiently processed as a unit, possibly because of quantitative metrics that reveal a high level of overlap between content elements.á Within tightly coupled workflow systems, online business objects û CORBA objects, EJBs, Web Services û may fill the role of a manifest.
Metric û the result of quantitative or qualitative analysis of one or more content elements.á Metrics are vendor- or process-specific and may include word counts, leveragability statistics, or qualitative assessments.á Metrics may be included as a report in a localization kit or in a vendor-specific namespace in the XLIFF manifest.
Reference element
û an abstraction of documents or
datasets intended to support work on one or more content elements.á Concrete reference elements include
glossaries, translation memories, and style guides.
Reference set û a collection of reference elements grouped together because they are relevant to work on one or more content sets.
XLIFF consumer û a tool or system capable of interpreting XLIFF documents.á A consumer may also be a producer vis-€-vis another consumer.
XLIFF document û an XML instance document conforming to the XLIFF schema.
XLIFF manifest û an XML document used to define relationships within and between content sets and reference sets in a localization kit.
XLIFF producer û a tool or system capable of creating XLIFF documents.