xliff-inline message

Subject: RE: [xliff-inline] Teleconference - Sep-14-2010 - 13:30 UTC -Summary
From: <bryan.s.schnabel@tektronix.com>
To: <ysavourel@translate.com>, <xliff-inline@lists.oasis-open.org>
Date: Tue, 14 Sep 2010 11:19:13 -0700
Hi Yves,

Forgive me for being a lurker instead of an active SC meeting participant. I am at fault for not attending the teleconference to discuss this in person. So please feel free to treat my comment as low priority.

I guess I agree with the group's assessment that 17 is more suited to be a guiding principle, than a requirement (a better worded requirement would be "preserve the ability to represent well-formed inline wrapper tags in XML source as well-formed inline wrapper tags in XLIFF"). But I do not understand the language the group settled on for the guiding principle.

I understand the first paragraph: "When processing the content with XML parsers, all the nodes of type TEXT should contain real text. This allows the separation between textual content and codes to be physical even in XML tree representation, rather than requiring interpretation of the markup." However this is different than the spirit of proposed requirement 17 (1.17 refers to the element node of (for example) <b>; 2.1 refers to the text node). In fact I begin to worry about this paragraph when the language implies that there is some kind of burden imposed by "requiring interpretation of markup." The gist of 17 is not that something like <b> need to be interpreted as bold, or emphasis, but rather that the inline wrapper tag is preserved as an inline wrapper tag. No interpretation needed as far as I can see.

I begin to lose track with the second paragraph: "For example, the imaginary representation below stores the native codes [startBold] and [endBold] as part of the content." It seems quite a bit abstracted from the example in 17: "This text is in <b>bold</b>" (in fact it seems to go out of its way to not be an XML example). 

After all the goal of 17 is to preserve the ability for inline elements to continue to be inline elements throughout the extraction and re composition of the XML --> XLIFF --> XML lifecycle. In short the guiding principle would be that an XSLT processer could process the inline element as an inline element for the trip back from XLIFF to XML (back to my drumbeat of "<g>, <x>, and <mrk> are XML-friendly; <bpt> and <ept> are not).

I agree that the code sample that follows in 2.1 should indeed be avoided. But it does not really address the concern cited in 17.

The next paragraph I find to be true, but it strays widely from the goal proposed in 17. I'm not sure why the code example shows two start tags w/o end tags. But that difference aside, it seems ironic to me that the example of what should be avoided in 1.17:

This text is in <code native="[startBold]"/>bold<code native="[endBold]"/>.

is (with the exception of the use of start tags instead of empty tags) exactly what is stated in 2.1 as "what we want to try to achieve:

This text is in <code native="[startBold]">bold<code native="[endBold]">.

So my conclusion (my opinion) is that guiding principle 2.1 is not really related to 1.17. And its stated goal is contradictory to the stated goal of 1.17. My fear is that we will go away from having inline wrapper tags in XLIFF. That will make processing XLIFF with XML tools problematic. And that would make me think the name of our standard would be curious.

(wow - maybe it's good that I did not attend the meeting - my long-winded reply would have used up the whole hour ;-)

Thanks,

Bryan



-----Original Message-----
From: Yves Savourel [mailto:ysavourel@translate.com] 
Sent: Tuesday, September 14, 2010 7:59 AM
To: xliff-inline@lists.oasis-open.org
Subject: [xliff-inline] Teleconference - Sep-14-2010 - 13:30 UTC - Summary

XLIFF Inline Markup Subcommittee Teleconference Summary

=== 1) Administration

Attending: Andrew, Milan, Arle, Yves, Lucia, Dimitra
Regrets: Asgeir



=== 2) Discussion: Requirements

Requirement working page:
http://wiki.oasis-open.org/xliff/OneContentModel/Requirements

We had several action items:

-- ACTION ITEM: Yves to find a real example for requirement #4
--> Sorry, still not done at this time (forgot).
Andrew: The bookmark element in ODF is a good illustration.
Yves: thanks, will use that then.


-- ACTION ITEM: Arle to bring any additional requirements from OSCAR.
Arle: Nothing at the moments.

Arle: Have to step out for a moment.


-- Discussion on requirement #13:
http://lists.oasis-open.org/archives/xliff-inline/201008/msg00016.html

Andrew: Should it me just text? Include codes as well.

Consensus on:

"Must be able to represent separately different flows of text and codes when, in the original format, they are mixed together.
Example 1: In DITA a footnote is stored at the location where it is referred to:
<p>Palouse horses<fn>A Palouse horse is the same as an Appaloosa.</fn> have spotted coats.</p>
This p element contains two separate flows: "Palouse horses have spotted coats" and "A Palouse horse is the same as an Appaloosa."
Example 2: The value of the HTML ALT attribute is stored in the IMG tag and can be within a paragraph:
<p>Click here: <img alt='OK' src='ok.png'/>.</p>


-- Discussion on requirement #14:

Milan: should be optional.
Andrew: is the direction important? 
Milan: could be with different context, so specifying exact relationships may be difficult.

Consensus on:

Should be able to represent the mutual relationships between a nested flow of text and its parent
The format should be able to represent both flows and have some information about their relationships, so the two text can be put in context when needed. 
For example, the relation between the value of an HTML ALT attribute and the paragraph element where it appears should be somehow preserved:
<p>Click here: <img alt='OK' src='ok.png'/>.</p>


-- Discussion on requirement #15:

Milan: for a unique char or string of invalid?
Andrew: experience is one by one.

ACTION ITEM: Yves to check the term used in XML specification "illegal" or "invalid", and use it everywhere.

Consensus on:

Should be able to represent illegal XML characters in the content
Some characters are illegal in XML, but they may appear in extracted text and we should have a common way to represent them so they can be preserved and merged back if necessary, without causing the XML tools to fail. 
For example in the following Java property string "Text with \u001a" the character U+001A is illegal in XML but needs to have a representation in XLIFF.
Note: An example of how some XML formats handle this case is the TS format from Qt-Linguist, which uses a <byte> element to represent such characters.


-- Discussion on requirement #16:

Milan: Look like warning for translator, no?
Yves: For both tool and possibly translator I think.

Consensus on:

Inline codes should have a way to store information about the effect on the segmentation
As some inline codes may have an effect on the segmentation of a given content, it is useful if segmentation-specific hints could be stored along with an inline code. 
For example: In HTML a <BR> element indicates a forced line break, while a <B>...</B> element should not affect the segmentation.


-- Discussion on requirement #17:

Andrew: Not a requirement, more like a guiding principle.
Yves: agree, it may not be possible to implement.
Will should group "guiding principles" in a separate list.

Consensus on:

[guiding principle] If possible, all text nodes of the content should be real text, not codes
When processing the content with XML parsers, all the nodes of type TEXT should contain real text.
This allows the separation between textual content and codes to be physical even in XML tree representation, rather than requiring interpretation of the markup. 
For example, the imaginary representation below stores the native codes [startBold] and [endBold] as part of the content. This is what we want to try to avoid. 
This text is in <code>[startBold]</code>bold<code>[endBold]</code>.
In contrast, the imaginary representation below stores the native codes [startBold] and [endBold] outside the content. Therefore the sum of all TEXT nodes represent only true text. This is what we want to try to achieve. 
This text is in <code native="[startBold]">bold<code native="[endBold]">.Note that this requirement may or may not be possible to achieve, depending on various factor.


Yves: Only one item left!
Let's work on it by email and next meeting we may be able start discussing the implementation.


=== 4) Other Business

None.

-meeting adjourned



---------------------------------------------------------------------
To unsubscribe from this mail list, you must leave the OASIS TC that
generates this mail.  Follow this link to all your TCs in OASIS at:
https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php
Follow-Ups:
- RE: [xliff-inline] Teleconference - Sep-14-2010 - 13:30 UTC - Summary
  - From: Yves Savourel <ysavourel@translate.com>
References:
- Teleconference - Sep-14-2010 - 13:30 UTC
  - From: Yves Savourel <ysavourel@translate.com>
- Teleconference - Sep-14-2010 - 13:30 UTC - Summary
  - From: Yves Savourel <ysavourel@translate.com>