Re: [xliff] From Mat Lovatt: reformat Summary Of Options.doc

xliff message

Subject: Re: [xliff] From Mat Lovatt: reformat Summary Of Options.doc

From: Tony Jewtushenko <Tony.Jewtushenko@oracle.com>

To: "Reynolds, Peter" <Peter.Reynolds@bowneglobal.ie>,xliff@lists.oasis-open.org

Date: Tue, 28 Jan 2003 15:10:39 +0000

Title: Reformat Summary Of Options

Hi Peter:

Thanks for the input.

On the issue of reformat itself - Doug's proposal fixes the reformat limitation in the 1.0 spec. At today's meeting it would be most productive to focus on closing this final issue off and moving onto the end-game tasks related to 1.1. We'll have plenty of opportunity at a future post-mortem session to rehash the successes and failures of the 1.1 project work.

Regards,

Tony

----- Original Message -----

From: Reynolds, Peter

To: xliff@lists.oasis-open.org

Cc: ddomeny@ektron.com ; 'Matthew.Lovatt'

Sent: Tuesday, January 28, 2003 12:08 PM

Subject: RE: [xliff] From Mat Lovatt: reformat Summary Of Options.doc

Hi all,

I too think Doug's proposal is the best way forward.

I also think we should discuss the comments from Yves and John on versions. This discussion on reformat was not even mentioned in May of last year when we came up with the final list of what should be in 1.1. If I rember correctly we decided that the deadline for making suggestions for 1.1 was before our face to face meeting in May. We then came from that meeting with a list of issues which had to been finalised. It is now seven months later and many new issues have been added. I think they have contributed to the spec but I agree with Yves comments that maybe we would have done things differently for a 2.0 specification and we could have achieved this in the same time.

Thanks,

Peter.

-----Original Message-----
From: Matthew.Lovatt [mailto:Matthew.Lovatt@oracle.com]
Sent: 28 January 2003 11:37
To: ddomeny@ektron.com; xliff@lists.oasis-open.org
Subject: Re: [xliff] From Mat Lovatt: reformat Summary Of Options.doc

I have finally had a chance to read Doug's mail in depth

I fully endorse his new proposal

While it may not be as neat and pretty as option 2, it does five us all the required control while remaining compatible with 1.0

I would be happy to use this new proposal in 1.1, and we can have another look at option 2 in XLIFF 2.0

Mat

----- Original Message -----

From: Doug Domeny

To: xliff@lists.oasis-open.org

Sent: Thursday, January 23, 2003 6:05 PM

Subject: RE: [xliff] From Mat Lovatt: reformat Summary Of Options.doc

Thank you for the summary, Tony. I agree with the options, but I have a few comments about compatibility and the need to retool. And I actually have another option too.

I refer to the guideline for minor releases (http://lists.oasis-open.org/archives/xliff/200208/msg00005.html).

Shall be comprised of small changes that would not require re-qualification of supporting tools or technologies

There are several aspects to compatibility to consider:

1. XLIFF 1.0 document validates against XLIFF 1.1 schema. Given the flexibility of schemas, it would almost always be possible to create a schema that allowed both 1.0 and 1.1 structures.

2. XLIFF 1.1 tool can process either XLIFF 1.0 or 1.1 documents without requiring extensive effort to handle XLIFF 1.0 documents.

3. XLIFF 1.0 tool can process either XLIFF 1.0 or 1.1 documents without modification (assuming a reasonably careful implementation).

Aspects #1 and #2 deal with backward compatibility (from the tool's perspective). That is, new tools and new schemas handle old data. The issue is not one of possibility, but of practicality. Is it easy to create the tools?

Aspect #3 is forward compatibility (from the tool's perspective). That is, can the old tool handle the new data? This is similar to asking whether MS Word 97 can read a MS Word 2000 document (allowing for some loss). Another example is whether an old browser, say IE 3, can render a new HTML document, say XHTML 1.0. Again, allowing for some loss for unknown tags. The primary rule for forward compatibility in a browser is, "render the contents of an unknown tag". This aspect of forward compatibility is crucial to meeting the guideline for not re-qualifying supporting tools.

XLIFF tools, however, are not as simple as browsers. An XLIFF tool must be able to modify the contents, not just render them. Because the contents must be modified, the XLIFF tool requires more knowledge of the tags. This is why adding extension points (non XLIFF tags) to content within <source> and <target> has been deferred.

Here are some comments regarding each option listed below as they pertain to "re-qualification of supporting tools or technologies".

Option 1 (siblings)

I believe this is forward compatible, assuming the tool doesn't assume that <target> immediately follows <source>.

The other concern is how <target-info> appears in <alt-trans> where multiple <target> elements are allowed.

I took another look at the XLIFF 1.0 DTD. Here are the <trans-unit> and <alt-trans> definitions:

<!ELEMENT trans-unit (source,target?,(count-group|note|context-group|prop-group|alt-trans)*) >

<!ELEMENT alt-trans (source?,target+,(note|context-group|prop-group)*) >

The new DTD would be:

<!ELEMENT trans-unit (source, source-info?, target?, target-info?,(count-group|note|context-group|prop-group|alt-trans)*) >

<!ELEMENT alt-trans (source?, source-info?, (target, target-info?)+, (note|context-group|prop-group)*) >

I think we all have some reservations about this approach because it is awkward to have two source elements and worse yet, difficult to match a given <target-info> element with its corresponding <target> element.

Option 2 (restructure)

We all agree this is a clean structure but not compatible.

Option 3 (embedded)

Allow me to given a different example using a tag and a placeholder tag.

<trans-unit id="Option 1" translate="yes >

<source>

 <ph/>Source Text</source>

<target>

 <ph/>Translated Text </target>

</trans-unit>

The inclusion of extension points for <source> and <target> are deferred because they introduce unknown tags into text that is processed by a TM tool. This option introduces unknown tags to the text content. This option isn't fully compatible because the TM tool will need to ignore and other unknown tags. Granted the unknown tags should come before the rest of the text to be translated, but I still do not believe it is forward compatible.

Besides, correctly parsing this structure is almost impossible. How does the tool know which tag is the last format tag and which is the first inline "placeholder" tag? Adding more "placeholder" tags to the specification would be impossible because the tool would have to assume any unknown tag is a format tag. This appears to not be a viable option.

Option 4 (combined)

This really isn't technically different than Option 2 other than to say that the XLIFF 1.1 schema and XLIFF 1.1 tools must support the old XLIFF 1.0 structure as well as the new structure. I do believe the effort is minimal to have the <source-info> and <target-info> tags be optional. However, if they are present, they will likely to break existing XLIFF 1.0 tools that look for the <source> as an immediate child of <trans-unit>. For instance, my existing XSL transforms would need to be updated to support XLIFF 1.1 documents. Therefore, this option isn't fully compatible with 1.0 even though it is backward compatible.

With all this said, I went back to determine the original purpose for proposing elements for reformatting. The issue is concerning being able to specify which format values may be modified during translation. In XLIFF 1.0, as you know, there are several attributes to specify formatting for the text. Namely, coord, font, css-style, style, and exstyle. The 'reformat' attribute of <trans-unit> is either "yes" or "no" indicating whether any or none of the format attribute values can be changed. The changed value is stored in the <target> tag.

The problem is that 'reformat' does not give sufficient control to be able to say that some formats may be changed, but others cannot. For example, it is allowed to change the coord-cx, but not coord-x or coord-y. The original proposal was to move each format attribute to be elements and each element would have its own 'reformat' attribute. This approach is fine except for the compatibility problems that have been discussed at length.

Here's the new option.

Extend the possible values for the 'reformat' attribute to provide sufficient control. XLIFF 1.0 presently uses ";"-delimited lists within attribute values to store multiple values. The 'coord' attribute is an example. It's value is actually four: "x;y;cx;cy", where "#" can be used for 'don't care'.

So let's extend 'reformat' the same way. Of course, we keep "yes" and "no" for compatibility.

"yes" = all format attributes may be changed

"no" = no format attributes may be changed

...or a semicolon-delimited list of the following in any order. If an attribute is listed, it means it may be reformatted.

coord = all 4 coords

coord-x

coord-y

coord-cx

coord-cy

font = all 3 font values

font-name

font-size

font-weight

css-style

style

exstyle

Example,

<trans-unit coord="#;#;183;272" font="Arial;2;normal" reformat="coord-cx;font-name" ...>

<source>...</source>

<target coord="#;#;181;272" font="System;2;normal">...</target>

 <alt-trans coord="#;#;183;272" font="Arial;2;normal">

 <target coord="#;#;180;272" font="Arial Bold;2;normal">...</target>

 <target coord="#;#;185;272" font="Arial, Helvetica;2;normal">...</target>

 </alt-tran>

</trans-unit>

Parsing the reformat list is fairly easy, even with XSLT, which has a limited set of string functions.

This option is 100% compatible, both forward and backward. It does not affect the structure at all. The only problem I can foresee an XLIFF 1.0 tool having is if an invalid value for reformat is assumed to be "yes" instead of "no" and allows some values to be changed that should. That is, an XLIFF 1.0 tool could interpret a value of "coord-cx;font-name" as "no" and not allow any of the format value to change. Of course, if it assumed "no" instead of "yes" it would not allow any changes. Since the default value for 'reformat' is "yes", I don't see either of the possibilities as being too harmful.

Regards,

Doug Domeny

Ektron, Inc.
+1 603 594-0249
http://www.ektron.com

-----Original Message-----
From: Tony Jewtushenko [mailto:Tony.Jewtushenko@oracle.com]
Sent: Thursday, January 23, 2003 9:22 AM
To: xliff@lists.oasis-open.org
Subject: [xliff] From Mat Lovatt: reformat Summary Of Options.doc

Reformat Summary of Options

Objective

Additional elements such as font, coord need to be associated with source and target

There are 4 proposals that I shall call

1) Siblings

2) Restructure

3) Embedded

4) Combined

Option 1 - Siblings

The <source-info> and <target-info> elements

Are made siblings of <source> and <target>

<trans-unit id="Option 1" translate="yes >

<source>Source Text</source>

<source-info>

<coord>

<x reformat = �no�>x </x>

<y reformat = �no�>y</y>

<cx reformat = �yes�>cx </x>

<cy reformat = �yes�>cy</y>

</coord>

</source-info>

 <target> Translated Text </target>

<target-info>

<coord>

<cx>cx </x>

<cy�>cy</y>

</coord>

</ target-info >

</trans-unit>

Issues

1) Is Fully 1.0 compliant

2) Two extra elements are required, each containing the same elements

Option 2 � Restructure

Completely new structures are used

The text element replaces the existing source and target elements

<trans-unit id="Option 2" translate="yes">

<source-info>

<text>Unable to store persistent object</text>

<coord>

<x reformat = �no�>x </x>

<y reformat = �no�>y</y>

<cx reformat = �yes�>cx </x>

<cy reformat = �yes�>cy</y>

</coord>

</source-info>

 <target-info>

<text>Unable to store persistent object translated</text>

<coord>

<cx>cx </x>

<cy�>cy</y>

</coord>

</ target-info>

</trans-unit>

Issues

1) Is not compatible with 1.0

2) Has clean structure

Option 3 � Embedded

The existing source and target elements can contain additional elements within their content

The actual �Text� is found between the closing brace of the last additional element and the <\target> mark

The following example also shows how white space issues will need to be handled

The extra elements need to be specified and implemented in a specified order, e.g. <coord>, , <�.>text

<trans-unit id="Option 1" translate="yes >

<source><coord>

<x reformat = �no�>x </x>

<y reformat = �no�>y</y>

<cx reformat = �yes�>cx </x>

<cy reformat = �yes�>cy</y>

</coord>Source Text</source>

<target><coord>

<cx>cx </x>

<cy�>cy</y>

</coord> Translated Text </target>

</trans-unit>

Issues

1) is fully compatible with 1.0

2) Is messy

Option 4 � Combined

Option 2 is combined with existing 1.0 structures

The schema says that a trans-unit contains either

<source> and <target>

or

<source-info> and <target-info>

Issues

1) Is fully compatible with 1.0

2) Is the cleanest implementation

3) Will require the most complex schema definition