OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

xliff message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]


Subject: RE: [xliff] XLIFF 1.0 issues


Hi,

In the current specification, the tool attribute is free text, the 1.0 spec
says that it "is used to specify the signature and version of the tool that
created or modified the document".

However, this mechanism is a bit loose and open to mis-use.  For example, a
tool may omit the version number.  Including tool-name and tool-version
attributes in the next version would be a better solution.

Regarding a tools registry, I don't think we could limit the names to a
standard list.  The hope is that as many tools as possible will use this
xliff format.  Is it necessary to have a naming convention for tool names?
A convention is too easy to ignore, I think the best solution may be to
introduce another attribute, tool-company.  This way a tool can be clearly
defined as 

tool-company = ACME
tool-name = Killer App
tool-version = 4.0

and not in a confusing manner such as 

tool = ACME Killer
or 
tool = ACME Ltd. Killer 4
or
tool = ACME, Killer App
or
tool = ACME Ltd., Killer App 4.0

I will write this up in more detail and propose these additions to the TC
for the next release of xliff.

Enda







 

-----Original Message-----
From: Stephen Holmes [mailto:sholmes@novell.com]
Sent: 14 April 2002 11:54
To: xliff@lists.oasis-open.org; John Reid
Subject: Re: [xliff] XLIFF 1.0 issues


Thanks for the information.  Can I ask then, does the tool information
capture the version of the tool aswell - is it just a free-text
attribute?  

Reason: The responsibiility to produce the count in the first instance
is the responsibility of the content parser.  As the parser may be
revised X times to address defects, add functionality etc, can we look
at a standardised way of specifying the tool/parser and version?


Is this XLIFF group, or some subcommitte looking at a tools registry,
i.e., agreed and standard names for the tools out there or some form of
guidelines for creating these Tool names?  

Finally, is there a plan to integrate/leverage the LISA findings on this
topic?


Steve.


S  t  e  p  h  e  n     H  o  l  m  e  s
Localisation Development Manager 
International Product Development

Voice:  +353 (1) 241 5732
Fax:     +353 (1) 241 5749

Novell, Inc., THE leading provider of Net business solutions
http://www.novell.com
>>> John Reid <JREID@novell.com> 04/12/02 19:05 PM >>>
On Point 1: 
[Alt-jr1] The purpose for including the tool as an attribute of the
<count-group> is so that Tool Y will know that the counts it is about to
use/update are not theirs. Thus, Tool Y may want to produce its own.
However, if Tool Y is compatable with Tool X, then it can use Tool X's
counts. Meanwhile, Tool X can find its own counts and update,
accordingly. 

<count-group tool="Tool X" name="example">
<count count-type="untranslated">132</count>
</count-group>

This does become complicated, though, when Tool Z is used. Tool Z may
have a compatibility issue with Tool Y but not ToolX. If Tool Y updated
Tool X's counts, Tool Z may use those inaccurately. 

[Alt-jr2] There is another solution to this: We already have a <phase>
element that stores the tool used in that phase. The phase-name
attribute could be added to <count>. Thus, when that count was produced
and by what, could be ascertained by any subsequent tool and a
determination of if to use the count could be made.

<phase-group>
<phase phase-name="create" process-name="Translation" tool="Tool X"
date="2002-04-10T09:41:02Z"/>
</phase-group>
..
<count-group name="example">
<count phase-name="create" count-type="untranslated">132</count>
</count-group>

Then again, with either method, a tool has to update the attribute of a
count or count-group element and historical data is lost. Thus, adding
the phase-name or the tool attribute methods have essentially the same
consequence: Tool Z knows which tool last touched the named count. Using
a tool attribute has the advantage of being in the scope of the current
node. The phase-name has the advantage of carrying additional
information such as date. 

[Alt-jr3] Another alternative is to add a new element to <count-group>,
such as <update>, that has attributes of tool and date. Thus, multiple
updates could be recorded for a <count-group>. This would need to be at
the <count-group> level since we do want to keep the contents of <count>
to the actual count. 

<count-group name="example">
<update tool="Tool X" date="2002-04-10T09:41:02Z"/>
<count phase-name="create" count-type="untranslated">132</count>
</count-group>

This solution has the disadvantage that it implies an update to all the
counts within a <count-group> of which there may be many and only one
updated. This is also a weakness for adding the tool attribute to
<count-group>.

Alt-jr2 can be used to keep historical data since ther is no
restriction on the number of counts that can be stored within the
<count-group>. Thus, Tool X can supply a count in phase 1, 

<phase-group>
<phase phase-name="create" process-name="Translation" tool="Tool X"
date="2002-04-10T09:41:02Z"/>
</phase-group>
..
<count-group name="example">
<count phase-name="create" count-type="untranslated">132</count>
</count-group>

Tool Y can add an update to it in phase 2, 

<phase-group>
<phase phase-name="create" process-name="Translation" tool="Tool X"
date="2002-04-10T09:41:02Z"/>
<phase phase-name="translate" process-name="Translation" tool="Tool Y"
date="2002-04-11T11:43:04Z"/>
</phase-group>
..
<count-group name="example">
<count phase-name="create" count-type="untranslated">132</count>
<count phase-name="translate" count-type="untranslated">43</count>
</count-group>

and Tool Z can update Tool X's count and ignore Tool Z's in phase 3. 

<phase-group>
<phase phase-name="create" process-name="Translation" tool="Tool X"
date="2002-04-10T09:41:02Z"/>
<phase phase-name="translate" process-name="Translation" tool="Tool Y"
date="2002-04-11T10:42:03Z"/>
<phase phase-name="review" process-name="Translation" tool="Tool Z"
date="2002-04-12T11:43:04Z"/>
</phase-group>
..
<count-group name="example">
<count phase-name="create" count-type="untranslated">132</count>
<count phase-name="translate" count-type="untranslated">43</count>
<count phase-name="review" count-type="untranslated">56</count>
</count-group>


Thoughts?

cheers,
john

>>> Stephen Holmes <sholmes@novell.com> 4/11/02 4:44:58 PM >>>
On point 1, I'd just make the comment that the value of adding the
tool
that created the wordcount as an attribute is of relatively little use
if you take a situation where, for example, "Tool X" generates the
data,
but "Tool Y" reads it for processing and has different ideas about
what
consitutes a word count.

It's an age old problem in localisation - "Who has the correct word
count?".  As tools may be completely proprietary, even if based on
XLIFF
containers, I see no reason in complicating the attribute qualifiers. 
This may become the topic of a subcommitte...

On point 3 - bear in mind that localisatin/language tools that aspire
to
be network-based will find base64 encoded content to be monumentally
large to transfer.  Europe, remember, is still predominantly 56K and
we
all remember the hassle involved in FedEx'ing CD's to China - business
reality supercedes specification.

Cheers
Steve.



S  t  e  p  h  e  n     H  o  l  m  e  s
Localisation Development Manager 
International Product Development

Voice:  +353 (1) 241 5732
Fax:     +353 (1) 241 5749

Novell, Inc., THE leading provider of Net business solutions
http://www.novell.com 
>>> John Reid <JREID@novell.com> 04/11/02 19:02 PM >>>
Hi All,

My comments follow Mark's, between <jr>...</jr> tags. 

>>> Mark Levins <mark_levins@ie.ibm.com> 4/5/02 5:59:53 AM >>>

1. <note> as a child of <count>
Currently the <count> element is very ambiguous, a note as a child
element 
could be used to indicate what was being counted, what was considered
a

word etc.

<jr>The <count-group> and <count> elements can be very problematic. A
<note> element within the <count> element may help in the customized
support required by these elements but that is a human readable
approach
and probably would need to be defined even more to be truly useful. A
stronger definition of the count element may do more for us. 
<count> has the 'unit' attribute which has recommended values of word,
page, trans-unit, bin-unit, and item. The latter three are defined
according to elements within the spec but the former two must be
defined
by the tool creating the count. I suggest that we include the tool as
an
attribute to the count-group. This would be the same attribute used in
<file>, <phase>, and <alt-trans>. Further refinement of the 'unit'
attribute may alo be necessary.</jr>


2. The <count-group>, <prop-group> and <context-group> elements can be

used within a <group> without any other relevant child elements
The 1.0 specification allows that a <group> element can contain (for 
example) a <count-group> without containing anything to count. I think
the 
<group> element should be changed to contain at least one of <group>, 
<trans-unit> or <bin-unit>.

<jr>Shouldn't this requirement be placed on the <body> also?</jr>


3. Binary elements & <internal-file>
This is kind of a big one. At the moment the specification does not
define 
the form of the content of the <internal-file> element (although there
is 
an optional 'form' attribute). The problem is see with this is that
the

specification allows users place binary data directly as content -
this

binary content may contain the reserved XML characters < > etc which
will 
cause parsers to choke.
The CDATA section approach is also not good enough to provide a
solution.
My suggestion is that the content of the <internal-file> be restricted
to 
Base64 or at least stated so.
Also, the description in the spec for the <internal-file> element
reads

"The <internal-file> element will contain the data for the skeleton
file." 
which is technically wrong, it may also contain data for an
<bin-source> 
or <bin-target> element.

<jr>How does CDATA fail this purpose? I wouldn't want to restrict this
to just Base64; thus, requiring a conversion for both the producer and
any subsequent processor that may be able to handle the original
format
without a problem. Additionally, wouldn't we need an attribute such as
'original-format' if we forced your conversion?</jr>


4. mime-type attribute of <bin-source>
How come this attribute is omitted from the <bin-source> element? Note

that it is an attribute of <bin-target>

<jr>We generally put attributes for <source> and <bin-source> in the
parent, <trans-unt> and <bin-unit>, respectively. The 'mime-type'
attribute of the target allows a different mime-type for the target in
cases where it differs from that specified from the <bin-unit>'s.
Otherwise, the mime-type of the target is unnecessary.</jr>

Cheers,
john

----------------------------------------------------------------
To subscribe or unsubscribe from this elist use the subscription
manager: <http://lists.oasis-open.org/ob/adm.pl>


----------------------------------------------------------------
To subscribe or unsubscribe from this elist use the subscription
manager: <http://lists.oasis-open.org/ob/adm.pl>

----------------------------------------------------------------
To subscribe or unsubscribe from this elist use the subscription
manager: <http://lists.oasis-open.org/ob/adm.pl>


----------------------------------------------------------------
To subscribe or unsubscribe from this elist use the subscription
manager: <http://lists.oasis-open.org/ob/adm.pl>


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]


Powered by eList eXpress LLC