----- Original Message -----
Sent: Thursday, January 23, 2003 6:05
PM
Subject: RE: [xliff] From Mat Lovatt:
reformat Summary Of Options.doc
Thank you for the summary, Tony. I agree with the
options, but I have a few comments about compatibility and the need to
retool. And I actually have another option too.
- Shall be
comprised of small changes that would not require re-qualification of
supporting tools or
technologies
There are several aspects to compatibility to
consider:
1. XLIFF 1.0 document validates against XLIFF 1.1 schema. Given the
flexibility of schemas, it would almost always be possible to create a
schema that allowed both 1.0 and 1.1 structures.
2. XLIFF 1.1 tool can process either XLIFF 1.0 or 1.1
documents without requiring extensive effort to handle XLIFF 1.0
documents.
3. XLIFF 1.0 tool can process either XLIFF 1.0 or 1.1 documents
without modification (assuming a reasonably careful
implementation).
Aspects #1 and #2 deal with backward compatibility (from the tool's
perspective). That is, new tools and new schemas handle old data. The
issue is not one of possibility, but of practicality. Is it easy to create
the tools?
Aspect #3 is forward compatibility (from the tool's perspective).
That is, can the old tool handle the new data? This is similar to asking
whether MS Word 97 can read a MS Word 2000 document (allowing for some
loss). Another example is whether an old browser, say IE 3, can render a
new HTML document, say XHTML 1.0. Again, allowing for some loss for
unknown tags. The primary rule for forward compatibility in a browser is,
"render the contents of an unknown tag". This aspect of forward
compatibility is crucial to meeting the guideline for not re-qualifying
supporting tools.
XLIFF tools, however, are not as simple as browsers. An XLIFF tool
must be able to modify the contents, not just render them. Because the
contents must be modified, the XLIFF tool requires more knowledge of the
tags. This is why adding extension points (non XLIFF tags) to content
within <source> and <target> has been
deferred.
Here are some comments regarding each option listed below as they
pertain to "re-qualification of supporting tools or
technologies".
Option 1 (siblings)
I believe this is forward compatible, assuming the tool doesn't
assume that <target> immediately follows <source>.
The other concern is how <target-info> appears in
<alt-trans> where multiple <target> elements are allowed.
I took another look at the XLIFF 1.0 DTD. Here are the
<trans-unit> and <alt-trans> definitions:
<!ELEMENT trans-unit
(source,target?,(count-group|note|context-group|prop-group|alt-trans)*)
>
<!ELEMENT alt-trans
(source?,target+,(note|context-group|prop-group)*)
>
The new DTD would be:
<!ELEMENT trans-unit (source, source-info?,
target?,
target-info?,(count-group|note|context-group|prop-group|alt-trans)*)
>
<!ELEMENT
alt-trans (source?, source-info?, (target,
target-info?)+, (note|context-group|prop-group)*)
>
I think we all have some reservations
about this approach because it is awkward to have two source elements and
worse yet, difficult to match a given <target-info> element with its
corresponding <target>
element.
Option 2 (restructure)
We all agree this is a clean structure but not
compatible.
Option 3 (embedded)
Allow me to given a different example using a <font> tag and
a placeholder tag.
<trans-unit id="Option 1" translate="yes
>
<source><font face="Arial"
size="2">
</font><ph/>Source
Text</source>
<target><font face="Arial"
size="3">
</font><ph/>Translated Text
</target>
</trans-unit>
The inclusion of extension points for <source> and
<target> are deferred because they introduce unknown tags into text
that is processed by a TM tool. This option introduces unknown tags to the
text content. This option isn't fully compatible because the TM tool will
need to ignore <font> and other unknown tags. Granted the unknown
tags should come before the rest of the text to be translated, but I still
do not believe it is forward compatible.
Besides, correctly parsing this structure is almost impossible. How
does the tool know which tag is the last format tag and which is the first
inline "placeholder" tag? Adding more "placeholder" tags to the
specification would be impossible because the tool would have to assume
any unknown tag is a format tag. This appears to not be a viable
option.
Option 4 (combined)
This really isn't technically different than Option 2 other than to
say that the XLIFF 1.1 schema and XLIFF 1.1 tools must support the old
XLIFF 1.0 structure as well as the new structure. I do believe the effort
is minimal to have the <source-info> and <target-info> tags be
optional. However, if they are present, they will likely to break
existing XLIFF 1.0 tools that look for the <source> as an immediate
child of <trans-unit>. For instance, my existing XSL transforms
would need to be updated to support XLIFF 1.1 documents. Therefore, this
option isn't fully compatible with 1.0 even though it is backward
compatible.
With all this said, I went back to determine the original purpose
for proposing elements for reformatting. The issue is concerning being
able to specify which format values may be modified during translation. In
XLIFF 1.0, as you know, there are several attributes to specify formatting
for the text. Namely, coord, font, css-style, style, and exstyle. The
'reformat' attribute of <trans-unit> is either "yes" or "no"
indicating whether any or none of the format attribute values can be
changed. The changed value is stored in the <target>
tag.
The problem is that 'reformat' does not give sufficient control to
be able to say that some formats may be changed, but others cannot. For
example, it is allowed to change the coord-cx, but not coord-x or coord-y.
The original proposal was to move each format attribute to be elements and
each element would have its own 'reformat' attribute. This approach is
fine except for the compatibility problems that have been discussed at
length.
Here's the new option.
Extend the possible values for the 'reformat' attribute to provide
sufficient control. XLIFF 1.0 presently uses ";"-delimited lists within
attribute values to store multiple values. The 'coord' attribute is an
example. It's value is actually four: "x;y;cx;cy", where "#" can be
used for 'don't care'.
So let's extend 'reformat' the same way. Of course, we keep "yes"
and "no" for compatibility.
"yes" = all format attributes may be changed
"no" = no format attributes may be changed
...or a semicolon-delimited list of the following in any order. If
an attribute is listed, it means it may be
reformatted.
coord = all 4 coords
coord-x
coord-y
coord-cx
coord-cy
font = all 3 font values
font-name
font-size
font-weight
css-style
style
exstyle
Example,
<trans-unit coord="#;#;183;272" font="Arial;2;normal"
reformat="coord-cx;font-name" ...>
<source>...</source>
<target coord="#;#;181;272"
font="System;2;normal">...</target>
<alt-trans coord="#;#;183;272"
font="Arial;2;normal">
<target coord="#;#;180;272"
font="Arial Bold;2;normal">...</target>
<target coord="#;#;185;272"
font="Arial, Helvetica;2;normal">...</target>
</alt-tran>
</trans-unit>
Parsing the reformat list is fairly easy, even with XSLT, which has
a limited set of string functions.
This option is 100% compatible, both forward and backward. It does
not affect the structure at all. The only problem I can foresee an XLIFF
1.0 tool having is if an invalid value for reformat is assumed to be "yes"
instead of "no" and allows some values to be changed that should. That is,
an XLIFF 1.0 tool could interpret a value of "coord-cx;font-name" as "no"
and not allow any of the format value to change. Of course, if it assumed
"no" instead of "yes" it would not allow any changes. Since the default
value for 'reformat' is "yes", I don't see either of the possibilities as
being too harmful.
Regards,
Doug Domeny
Ektron, Inc.
+1 603
594-0249
http://www.ektron.com
Reformat Summary of
Options
Objective
Additional elements
such as font, coord need to be associated with source and
target
There
are 4 proposals that I shall call
1)
Siblings
2)
Restructure
3)
Embedded
4)
Combined
Option 1 - Siblings
The
<source-info> and <target-info> elements
Are made siblings of
<source> and <target>
<trans-unit
id="Option 1" translate="yes >
<source>Source
Text</source>
<source-info>
<coord>
<x reformat =
“no”>x </x>
<y reformat =
“no”>y</y>
<cx reformat =
“yes”>cx </x>
<cy reformat =
“yes”>cy</y>
</coord>
</source-info>
<target> Translated Text
</target>
<target-info>
<coord>
<cx>cx
</x>
<cy”>cy</y>
</coord>
</ target-info
>
</trans-unit>
Issues
1)
Is Fully 1.0
compliant
2)
Two extra elements
are required, each containing the same
elements
Option 2 –
Restructure
Completely new
structures are used
The text element
replaces the existing source and target
elements
<trans-unit
id="Option 2" translate="yes">
<source-info>
<text>Unable
to store persistent object</text>
<coord>
<x reformat =
“no”>x </x>
<y reformat =
“no”>y</y>
<cx reformat =
“yes”>cx </x>
<cy reformat =
“yes”>cy</y>
</coord>
</source-info>
<target-info>
<text>Unable
to store persistent object
translated</text>
<coord>
<cx>cx
</x>
<cy”>cy</y>
</coord>
</
target-info>
</trans-unit>
Issues
1)
Is not compatible
with 1.0
2)
Has clean
structure
Option 3 –
Embedded
The existing source
and target elements can contain additional elements within their
content
The actual “Text” is
found between the closing brace of the last additional element and the
<\target> mark
The following
example also shows how white space issues will need to be
handled
The extra elements
need to be specified and implemented in a specified order, e.g.
<coord>, <font>, <….>text
<trans-unit
id="Option 1" translate="yes >
<source><coord>
<x reformat =
“no”>x </x>
<y reformat =
“no”>y</y>
<cx reformat =
“yes”>cx </x>
<cy reformat =
“yes”>cy</y>
</coord>Source
Text</source>
<target><coord>
<cx>cx
</x>
<cy”>cy</y>
</coord>
Translated Text </target>
</trans-unit>
Issues
1)
is fully compatible
with 1.0
2)
Is
messy
Option 4 –
Combined
Option 2 is combined
with existing 1.0 structures
The schema says that
a trans-unit contains either
<source> and
<target>
or
<source-info>
and <target-info>
Issues
1)
Is fully compatible
with 1.0
2)
Is the cleanest
implementation
3)
Will require the
most complex schema definition