xliff message

Subject: RE: [xliff] Re: XLIFF 1.1 Specification Working Draft 14 - RC5

From: "Yves Savourel" <ysavourel@translate.com>
To: xliff@lists.oasis-open.org
Date: Thu, 8 May 2003 10:05:09 -0600

Hi all,

That part in the 1.0 specification about rid being already described as links for <bx/> and <ex/> escaped me since I was looking at <bpt>/<ept> and assumed <bx/> and <ex/> would be the same.

So I guess, we should go with the path (1) or (2). And we would need to update the text for <bpt>/<ept> as well.

I still have never run into a real case where overlapping codes were allowed, but I guess we should make provision for it. It seems a big descripency between the frequence of the problem and the burden of the solution on the other cases.

cheers,

-yves

-----Original Message-----
From: Doug Domeny [mailto:ddomeny@ektron.com]
Sent: Thu, May 08, 2003 8:02 AM
To: xliff@lists.oasis-open.org
Subject: RE: [xliff] Re: XLIFF 1.1 Specification Working Draft 14 - RC5

All,

The original problem with the statement "the paired codes, <bx/>+<ex/> and <bpt>+<ept>, should be related via the rid attribute" is that it is vague. It does not describe HOW they are related, only that the 'rid' attribute is used in some way. From John's example it is clear that they are related by setting the two 'rid' attributes to the same value. The alternative is to set the 'rid' value to match the 'id' value of the corresponding tag. I yield to those on the TC longer than I to determine the original indent on how to use 'id' and 'rid'. The most important issue here is to make it very clear how to relate the paired tags. Including an example, such as John's, is essential to understanding.

The 'rid' attribute SHOULD HAVE BEEN required for <bx>, <ex>, <bpt>, and <ept> in XLIFF 1.0 because it is needed to refer to the matching begin/end tag. This change, however, is not fully compatible with XLIFF 1.0 because some documents may not have the 'rid' attribute, but I do not know how the matching pairs would be identified. Therefore, the XLIFF 1.1 schema cannot enforce 'rid' without causing compatibility problems with older XLIFF 1.0 documents. I do not feel it is necessary to enforce the 'rid' attribute. If the specification is unambiguous, people will properly apply the 'id' and 'rid' attributes.

The remaining question is "Can the paired tags be matched by 'id' without using 'rid'?". In my opinion, the answer should be "No". Providing two ways to do the same thing causes the user to make an arbitrary decision. However, I realize that some may already use 'id' instead of 'rid' because 'rid' is optional. Perhaps someone with more experience can offer more information.

In summary, here are the three choices on how to relate paired tags:

(1) 'rid' matches 'id' ... <bx id="b1" rid="e1"/> ... <ex id="e1" rid="b1"/>

(2) 'rid' matches 'rid' ... <bx rid="1"/> ... <ex rid="1"/>

(3) 'id' matches 'id' ... <bx id="1"/> ... <ex id="1"/>

Comments:

This is the XLIFF 1.0 description for the 'id' and 'rid' attributes:

"The id attribute is used in many elements, as a unique reference to the original corresponding code data or format for the given element."

"The rid attribute is used to link different elements that are related."

The XLIFF 1.0 description for <bx/> adds:

"These paired elements are related via their rid attributes."

It seems that the 'id' attribute was never intended to be used to match pairs, but rather the 'rid' attribute was included for that purpose. Therefore, choice (3) would be disqualified (although this was the verbal consensus at the last conference call). John's concerns are addressed by either choice (1) or choice (2). Choice (1) is better from a strict XML point of view because the 'id' attribute identifies an element and the 'rid' refers to a different element. In a DTD or schema, the 'id' attribute would be type ID and 'rid' would be REFID. The DTD would then enforce that 'rid' actually refers to an existing element. However, the XLIFF 1.0 DTD does not define 'id' or 'rid' in this way, and, in fact, 'id' does not even need to be unique. Choice (2) is less complex and easier to implement than choice (1) given that 'id' may not be unique.

Given this reasoning, I recommend that the specification describe the use of 'rid' in <bx>, <ex>, <bpt>, and <ept> as shown in choice (2), which is the same as John's recommendation and Mark's agreement. Although it would be nice to make 'rid' required, it is not essential. More important would be a way to enforce that the 'rid' values match correctly, that is, a <bx rid> matches with exactly one <ex rid>. I do not know if this is possible with W3C XML Schema.

Regards,

Doug Domeny
Software Analyst

Ektron, Inc.
+1 603 594-0249 x212
http://www.ektron.com

-----Original Message-----
From: David Pooley [mailto:DPooley@sdlintl.com]
Sent: Thursday, May 08, 2003 6:29 AM
To: xliff@lists.oasis-open.org
Subject: RE: [xliff] Re: XLIFF 1.1 Specification Working Draft 14 - RC5

I'm not comfortable with this for the reasons outlined below.

As I understand it, <bx/> is a shortened way of writing <bpt></bpt> and <ex/> is a shortened way of writing <ept></ept>. Quite why we need these extra tags (saving six characters per occurrence) is a little bit beyond me to start with. However, <bpt> and <ept> are taken from the TMX format where there is an attribute, "i" which links them together. This is a required attribute in these tags.

In the specification, <bx> and <ex> as described as "begin paired placeholder" and "end paired placeholder" respectively. To me, this means that each <bx/> must have a corresponding <ex> and vice-versa. If this is the case then there must be a mandatory attribute that links them together. However, under John's proposal, we are suggesting that the <bx/> and <ex/> tags should be linked via a non-mandatory attribute. We're are also suggesting linking them together using an attribute which has a vague description and could be used for a variety of purposes (including using it as a lookup device in the skeleton file), some of which are references outside of the <trans-unit>.

Either the rid becomes mandatory for <bx/>, <ex/>, <bpt> and <ept> (and its wording is changed to reflect its new status) or we agree to use the id attribute. With John's example, it's still possible to have the rid as optional and use it to reference in to the skeleton:

Original

{This is translatable text which is {/weight=+w1}bold, {/weight=+w3} italic and bold{/weight=-w1}, and italic.{/weight=-w3}}

Skeleton
{#TU id=18#
#CODE rid=19#{/weight=-w1}
#CODE rid=20#{/weight=+w3}
#CODE rid=21#{/weight=-w1}
#CODE rid=22#{//weight=-w3}
}

XLIFF

<trans-unit id="18">
<source>This is translatable text which is <bx id="1" rid="19"/>bold, <bx id="2" rid="20"/>italic and bold<ex id="1" rid="21"/>, and italic<ex id="2" rid="22"/>.</source>
</trans-unit>

David Pooley

Software Architect

SDL International

-----Original Message-----
From: Mark Levins [mailto:mark_levins@ie.ibm.com]
Sent: 08 May 2003 10:21
To: JREID@novell.com
Cc: xliff@lists.oasis-open.org
Subject: RE: [xliff] Re: XLIFF 1.1 Specification Working Draft 14 - RC5

Hi John,

I fully agree with your mail, it describes the usage we originally came up with, which in essence means that 'id' attributes are for the general purpose of linking information to a skeletal data while 'rid' attributes are used to refer related information within an XLIFF document.

Regards,

Mark Levins

IBM Software Group,
Dublin Software Laboratory,
Airways Industrial Estate,
Cloghran,
Dublin 17,
Ireland. Phone: +353 1 704 6676
IBM Tie Line 166676

"John Reid" <JREID@novell.com>
07/05/2003 23:36

To
<ddomeny@ektron.com>, "<", <ysavourel@translate.com>

cc

Subject
RE: [xliff] Re: XLIFF 1.1 Specification Working Draft 14 - RC5

Hi Doug, Yves, et al,

Doug Domeny wrote:
>Replace "These paired elements are related via their rid attributes" (occurs
>5 times) with:
>
>These paired elements are matched by setting their id attributes to the same
>value. For example, <bx id="34"/> ... <ex id="34"/>
I think we are imposing a method on the XLIFF filters and a format on the skeletons with this change. Suppose we have a file format with the following text:

{This is translatable text which is {/weight=+w1}bold, {/weight=+w3} italic and bold{/weight=-w1}, and italic.{/weight=-w3}}

Here the codes are allowed to overlap. I've used bold and italic for simplicity but this could be complicated with more complex codes. The filter assigns each code a separate id because without a lookup table of some sort the filter doesn't know that /weight=+w1 means 'bold'. It does recognize that weight=+w1 and weight=-w1 are paired. Also, the developer of the filter did not want to regenerate codes that could as easily be stored. Thus, the skeleton for this text may look as follows:

{#TU id=18#
#CODE id=19#{/weight=-w1}
#CODE id=20#{/weight=+w3}
#CODE id=21#{/weight=-w1}
#CODE id=22#{//weight=-w3}
}

The <trans-unit> appears as follows:

<trans-unit id = "18">
<source>This is translatable text which is <bx id="19" rid="1"/>bold, <bx id="20" rid="2"/>italic and bold<ex id="21" rid="1"/>, and italic<ex id="22" rid="2"/>.
</source>
</trans-unit>

Thus, the id attribute relates the codes in the skeleton to the inline elements in the XLIFF. The rid attribute relates the paired codes to each other.

If the id must match between the the paired codes, the skeleton wouldn't store the end codes and the postprocessor must generate those.

This is why I think that the paired codes, <bx/>+<ex/> and <bpt>+<ept>, should be related via the rid attribute. This gives the greatest freedom to the filter writers.

--john
********************************************************************** This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the system manager. This footnote also confirms that this email message has been swept for the presence of computer viruses. **********************************************************************