OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

xliff message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]


Subject: RE: [xliff] From Mat Lovatt: reformat Summary Of Options.doc



Excellent proposal Doug!
It doesn't quite solve modifications to data in absence of text but definitely provides a suitable compromise for 1.1.

Mark Levins
IBM Software Group,
Dublin Software Laboratory,
Airways Industrial Estate,
Cloghran,
Dublin 17,
Ireland.
Phone: +353 1 704 6676
IBM Tie Line 166676




Doug Domeny <ddomeny@ektron.com>

23/01/2003 18:05
Please respond to
ddomeny@ektron.com

To
xliff@lists.oasis-open.org
cc
Subject
RE: [xliff] From Mat Lovatt: reformat Summary Of Options.doc






Thank  you for the summary, Tony. I agree with the options, but I have a few comments  about compatibility and the need to retool. And I actually have another option  too.
 
I  refer to the guideline for minor releases (http://lists.oasis-open.org/archives/xliff/200208/msg00005.html).

  1. Shall be comprised of  small changes that would not require re-qualification of supporting tools or  technologies


 
There  are several aspects to compatibility to consider:
 
1.  XLIFF 1.0 document validates against XLIFF 1.1 schema. Given the flexibility of  schemas, it would almost always be possible to create a schema that allowed both  1.0 and 1.1 structures.
 
2.  XLIFF 1.1 tool can process either XLIFF 1.0 or 1.1 documents without  requiring extensive effort to handle XLIFF 1.0 documents.  
 
3. XLIFF 1.0 tool  can process either XLIFF 1.0 or 1.1 documents without modification (assuming a  reasonably careful implementation).
 
Aspects #1 and #2 deal with backward compatibility (from the tool's  perspective). That is, new tools and new schemas handle old data. The issue is  not one of possibility, but of practicality. Is it easy to create the  tools?
 
Aspect  #3 is forward compatibility (from the tool's perspective). That is, can the old  tool handle the new data? This is similar to asking whether MS Word 97 can read  a MS Word 2000 document (allowing for some loss). Another example is whether an  old browser, say IE 3, can render a new HTML document, say XHTML 1.0. Again,  allowing for some loss for unknown tags. The primary rule for forward  compatibility in a browser is, "render the contents of an unknown tag". This  aspect of forward compatibility is crucial to meeting the guideline for not  re-qualifying supporting tools.
 
XLIFF  tools, however, are not as simple as browsers. An XLIFF tool must be able to  modify the contents, not just render them. Because the contents must be  modified, the XLIFF tool requires more knowledge of the tags. This is why adding  extension points (non XLIFF tags) to content within <source> and  <target> has been deferred.
 
Here  are some comments regarding each option listed below as they pertain to  "re-qualification of supporting tools or technologies".
 
Option  1 (siblings)
 
I  believe this is forward compatible, assuming the tool doesn't assume that  <target> immediately follows <source>.
The  other concern is how <target-info> appears in <alt-trans> where  multiple <target> elements are allowed.
 
I took  another look at the XLIFF 1.0 DTD. Here are the <trans-unit> and  <alt-trans> definitions:
 
<!ELEMENT trans-unit     (source,target?,(count-group|note|context-group|prop-group|alt-trans)*)  >
<!ELEMENT alt-trans      (source?,target+,(note|context-group|prop-group)*) >
 
The  new DTD would be:
 

<!ELEMENT trans-unit    (source, source-info?, target?,  target-info?,(count-group|note|context-group|prop-group|alt-trans)*)  >
<!ELEMENT alt-trans      (source?, source-info?, (target,  target-info?)+, (note|context-group|prop-group)*)  >
 
I think we all have some reservations about  this approach because it is awkward to have two source elements and worse yet,  difficult to match a given <target-info> element with its corresponding  <target> element.

 
Option  2 (restructure)
 
We all  agree this is a clean structure but not compatible.
 
Option  3 (embedded)
 
Allow  me to given a different example using a <font> tag and a placeholder  tag.
 
<trans-unit id="Option 1" translate="yes >
   <source><font face="Arial" size="2">
      </font><ph/>Source  Text</source>
   <target><font face="Arial" size="3">
       </font><ph/>Translated  Text </target>
</trans-unit>
 
The  inclusion of extension points for <source> and <target> are deferred  because they introduce unknown tags into text that is processed by a TM tool.  This option introduces unknown tags to the text content. This option isn't fully  compatible because the TM tool will need to ignore <font> and other  unknown tags. Granted the unknown tags should come before the rest of the text  to be translated, but I still do not believe it is forward compatible.  
 
Besides, correctly parsing this structure is almost impossible. How does  the tool know which tag is the last format tag and which is the first inline  "placeholder" tag? Adding more "placeholder" tags to the specification would be  impossible because the tool would have to assume any unknown tag is a format  tag. This appears to not be a viable option.
 
Option  4 (combined)
 
This  really isn't technically different than Option 2 other than to say that the  XLIFF 1.1 schema and XLIFF 1.1 tools must support the old XLIFF 1.0 structure as  well as the new structure. I do believe the effort is minimal to have the  <source-info> and <target-info> tags be optional. However, if they  are present, they will likely to break existing XLIFF 1.0 tools that look  for the <source> as an immediate child of <trans-unit>. For  instance, my existing XSL transforms would need to be updated to support  XLIFF 1.1 documents. Therefore, this option isn't fully compatible with 1.0 even  though it is backward compatible.
 
 
 
With  all this said, I went back to determine the original purpose for proposing  elements for reformatting. The issue is concerning being able to specify which  format values may be modified during translation. In XLIFF 1.0, as you know,  there are several attributes to specify formatting for the text.  Namely, coord, font, css-style, style, and exstyle. The 'reformat'  attribute of <trans-unit> is either "yes" or "no" indicating whether any  or none of the format attribute values can be changed. The changed value is  stored in the <target> tag.
 
The  problem is that 'reformat' does not give sufficient control to be able to say  that some formats may be changed, but others cannot. For example, it is allowed  to change the coord-cx, but not coord-x or coord-y. The original proposal was to  move each format attribute to be elements and each element would have its own  'reformat' attribute. This approach is fine except for the compatibility  problems that have been discussed at length.
 
Here's  the new option.
 
Extend  the possible values for the 'reformat' attribute to provide sufficient control.  XLIFF 1.0 presently uses ";"-delimited lists within attribute values to store  multiple values. The 'coord' attribute is an example. It's value is actually  four: "x;y;cx;cy", where "#" can be used for 'don't  care'.
 
So  let's extend 'reformat' the same way. Of course, we keep "yes" and "no" for  compatibility.
 
"yes"  = all format attributes may be changed
"no" =  no format attributes may be changed
...or  a semicolon-delimited list of the following in any order. If an attribute is  listed, it means it may be reformatted.
coord  = all 4 coords
coord-x
coord-y
coord-cx
coord-cy
font =  all 3 font values
font-name
font-size
font-weight
css-style
style
exstyle
 
Example,
 
<trans-unit coord="#;#;183;272" font="Arial;2;normal"  reformat="coord-cx;font-name" ...>
    <source>...</source>
    <target coord="#;#;181;272"  font="System;2;normal">...</target>
   <alt-trans coord="#;#;183;272"  font="Arial;2;normal">
       <target coord="#;#;180;272"  font="Arial Bold;2;normal">...</target>
       <target coord="#;#;185;272"  font="Arial, Helvetica;2;normal">...</target>
   </alt-tran>
</trans-unit>
 
Parsing the reformat list is fairly easy, even with XSLT, which has a  limited set of string functions.
 

This  option is 100% compatible, both forward and backward. It does not affect the  structure at all. The only problem I can foresee an XLIFF 1.0 tool having is if  an invalid value for reformat is assumed to be "yes" instead of "no" and allows  some values to be changed that should. That is, an XLIFF 1.0 tool could  interpret a value of "coord-cx;font-name" as "no" and not allow any of the  format value to change. Of course, if it assumed "no" instead of "yes" it would  not allow any changes. Since the default value for 'reformat' is "yes", I don't  see either of the possibilities as being too harmful.

Regards,

Doug Domeny

Ektron, Inc.
+1 603  594-0249
http://www.ektron.com
-----Original Message-----
From: Tony Jewtushenko  [mailto:Tony.Jewtushenko@oracle.com]
Sent: Thursday, January 23,  2003 9:22 AM
To: xliff@lists.oasis-open.org
Subject:  [xliff] From Mat Lovatt: reformat Summary Of Options.doc



 

Reformat Summary of  Options

 

Objective

Additional elements such  as font, coord need to be associated with source and  target

 

There are 4 proposals that I shall  call

1)      Siblings

2)      Restructure

3)      Embedded  

4)      Combined

 

 

Option 1 - Siblings  

The <source-info>  and <target-info> elements

Are made siblings of  <source> and <target>

 

 

<trans-unit id="Option  1" translate="yes >

<source>Source  Text</source>

<source-info>

<coord>

<x reformat = “no”>x  </x>

<y reformat =  “no”>y</y>

<cx reformat =  “yes”>cx </x>

<cy reformat =  “yes”>cy</y>

</coord>

</source-info>

           <target> Translated Text  </target>

<target-info>

<coord>

<cx>cx  </x>

<cy”>cy</y>

</coord>

</  target-info >

</trans-unit>

 

Issues

1)      Is Fully 1.0  compliant

2)      Two extra elements are  required, each containing the same elements

 

Option 2 –  Restructure

 

Completely new structures  are used

The text element replaces  the existing source and target elements

 

<trans-unit id="Option  2" translate="yes">

<source-info>

<text>Unable to  store persistent object</text>

<coord>

<x reformat = “no”>x  </x>

<y reformat =  “no”>y</y>

<cx reformat =  “yes”>cx </x>

<cy reformat =  “yes”>cy</y>

</coord>

</source-info>

           <target-info>

<text>Unable to  store persistent object translated</text>

<coord>

<cx>cx  </x>

<cy”>cy</y>

</coord>

</  target-info>

</trans-unit>

 

Issues

1)      Is not compatible with  1.0

2)      Has clean  structure

 

Option 3 –  Embedded

 

 

The existing source and  target elements can contain additional elements within their  content

The actual “Text” is found  between the closing brace of the last additional element and the  <\target> mark

 

The following example also  shows how white space issues will need to be  handled

 

The extra elements need to  be specified and implemented in a specified order, e.g. <coord>,  <font>, <….>text

 

<trans-unit id="Option  1" translate="yes >

<source><coord>

<x reformat = “no”>x  </x>

<y reformat =  “no”>y</y>

<cx reformat =  “yes”>cx </x>

<cy reformat =  “yes”>cy</y>

</coord>Source  Text</source>

<target><coord>

<cx>cx  </x>

<cy”>cy</y>

</coord> Translated  Text </target>

</trans-unit>

 

Issues

1)      is fully compatible with  1.0

2)      Is  messy

 

Option 4 –  Combined

Option 2 is combined with  existing 1.0 structures

 

The schema says that a  trans-unit contains either

<source> and  <target>

or  

<source-info> and  <target-info>

Issues

1)      Is fully compatible with  1.0

2)      Is the cleanest  implementation

3)      Will require the most  complex schema definition

 

 

 

 


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]


Powered by eList eXpress LLC