RE: [xliff] Data associated with <segment>

Hi David,

Directionality of a script does not influence the storage of the data (most of the time). It is just a display property. So there is no need to look at script direction when creating PRs that don’t directly involve the direction. But when merging two segments with different directionality we would need to have PRs.

Regards,

Fredrik Estreen

From: xliff@lists.oasis-open.org [mailto:xliff@lists.oasis-open.org] On Behalf Of Dr. David Filip
Sent: den 20 november 2012 22:44
To: Yves Savourel
Cc: xliff@lists.oasis-open.org
Subject: Re: [xliff] Data associated with <segment>

Fredrik, Yves, I like the agreement that you reached.

Now could we make <mrk> and <sm> extensible (again) as I keep proposing?

Implementers should be advised to put on segment only such metadata that becomes irrelevant on re-segmentation and then the re-segmentation PRs can become real simple.

If we are talking about inheriting say an ID from the left hand or right hand side segment on merge, shouldn't this dependent on the directionality of the parent unit?

Cheers

Dr. David Filip

=======================

LRC | CNGL | LT-Web | CSIS

University of Limerick, Ireland

telephone: +353-6120-2781

cellphone: +353-86-0222-158

facsimile: +353-6120-2734

mailto: david.filip@ul.ie

On Tue, Nov 20, 2012 at 3:21 PM, Yves Savourel <ysavourel@enlaso.com> wrote:

I completely agree with you Fredrik that putting anything at the segment level is risky and the implementers should be aware of the consequences.

maybe some warning note in the extension point section could help.

-yves

From: Estreen, Fredrik [mailto:Fredrik.Estreen@lionbridge.com]
Sent: Tuesday, November 20, 2012 7:59 AM
To: Yves Savourel; xliff@lists.oasis-open.org

Subject: RE: [xliff] Data associated with <segment>

Hi Yves,

Perhaps the suggestion to not allow extensions on <segment> is to take it too far. But I think that designers of modules and extensions should really be aware of the immensely higher risk of losing their data or have it made out of date if placed on these elements. In many cases the risk will mean that it is no longer practical to use that extension point. The removal is a quick way to resolve the issues of creating PRs, but defining good PRs might be better. I still fear that it will be hard to enforce those PRs in the field, but the counter argument to that is that anyone who which to break them can do so regardless.

I completely agree that using <mrk> or attributes on the other inline elements should be the preferred way to add information to a subsets of text in a <unit>.

Regards,

Fredrik Estreen

From: xliff@lists.oasis-open.org [mailto:xliff@lists.oasis-open.org] On Behalf Of Yves Savourel
Sent: den 20 november 2012 15:00
To: xliff@lists.oasis-open.org
Subject: RE: [xliff] Data associated with <segment>

Hi Fredrik, all,

I would be against promoting the idea that segmentation could use the units as the way to represent segmentation. That would open the door to the same mess we have in 1.2 where, despite having a standard way to represent segmentation, some tools would use another way.

Since we have modules at the segment level, we have to define PR for the tools not supporting them. Therefore, to me, there is no reason to forbid extension there. Segment-level properties are by nature related to the segmentation, if you change it, it make sense that those properties don’t apply anymore.

If the properties relates to the content of the segment then a best practice may be to promote the use of <mrk> attached to an element that lives at the unit level: those data are safe during re-segmentation.

cheers,

-yves

From: xliff@lists.oasis-open.org [mailto:xliff@lists.oasis-open.org] On Behalf Of Estreen, Fredrik
Sent: Tuesday, November 20, 2012 5:40 AM
To: xliff@lists.oasis-open.org
Subject: [xliff] Data associated with <segment>

Hi All,

Looking at the current draft and some proposed modules more and more data is attached to or in the <segment> and <ignorable> elements. I think this is a bad design in the face of re-segmentation. Any data placed as a descendant of those nodes MUST have processing requirements regarding how a tool should handle them if it perform re-segmentation. This obviously extended to attributes on <source> and <target> as well.

Re-segmentation is an arbitrary sequence of joining and splitting of these nodes. I personally feel that it will be close to impossible to specify processing requirements for anything that is not fully defined in the core specification for these nodes. Modules that do not allow customization could have module specific PRs defined in the core but seem to overload the core with module details.

To limit the scope let’s consider <metaHolder> in <segment> and a generic tool without knowledge of the data stored in the <metaHolder>.

If we split the segment I see the following possible processing requirements:

a. Remove the meta holder

b. Keep the meta holder on the left hand <segment>

c. Keep the meta holder on the right hand <segment>

d. Copy the meta holder so it exist on both <segment>s

e. Forbid re-segmentation

If we join two adjacent segments both having <metaHolder>:

a. Remove both meta holders

b. Keep the meta holder from the left hand <segment>

c. Keep the meta holder from the right hand <segment>

d. Merge the two meta holders and recursively resolve key conflicts

a. Keep the left hand side value in the resultant metaHolder

b. Keep the right hand side value in the resultant metaHolder

c. Duplicate the keys and keep both values

d. Concatenate the values from both sides into one key/value

e. Forbid re-segmentation

For the core we need to choose ONE rule for split and ONE rule for join. It is unlikely that the chosen rule will be good for anything but a subset of the use cases. Or we could add behavioral attributes and more complex PRs to <metaHolder>

If we extend this to custom schema extensions the merge cases likely become un-available as an XML tree merge of unknown content would be likely to cause schema violations. To allow selection of PRs to apply we would need to add a mechanism to the core that extensions can use to select how it’s data should be treated.

The end result is that any application relying on extension data here or modules with third party customization would have to cope with the data being lost or in-accurate. Or we severely limit the ability of processors to re-segment the contents of <unit>. Or finally that we end up with a much more complex core. Neither of these seem good to me.

My proposal is to not allow third party customizable data (either as full extensions or customizable modules) on these elements and try to limit the usage of these elements to hold modules elements and attributes.

If there is a need from XLIFF extraction to have metadata or module information on what the initial extractor consider a segment it could put what it considers THE segment into a <unit> and attach the metadata at that level. If subsequent tools preform re-segmentation the data on the <unit> level would still apply unchanged to the sum of segments in the <unit>.

Regards,

Fredrik Estreen

xliff message