OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

xliff message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: RE: [xliff] Re-segmentation


Hi Ryan,

You probably mean a segmentRef (or maybe segRef) attribute?
(nid was the old name for the attribute referencing the original data of an inline code, that attribute is now dataRef)

-ys

-----Original Message-----
From: xliff@lists.oasis-open.org [mailto:xliff@lists.oasis-open.org] On Behalf Of Ryan King
Sent: Friday, June 21, 2013 10:16 PM
To: Ryan King; Schnabel, Bryan S; Estreen, Fredrik; Yves Savourel; 'XLIFF Main List'
Subject: RE: [xliff] Re-segmentation

I almost forgot, one additional need (please read the mail below for full understanding) would be to add a nid to <ctr:changeTrack>
to reference the appropriate segment. So this:

<unit id="1">
  <segment id="s1">
    <source ctr:checksum="5E894D8C" ctr:author="system" ctr:datetime="2013-06-15T10:00:00+8:00">Hello World. Good-bye
World.</source>
    <target ctr:checksum="5E894D8C" ctr:author="system" ctr:datetime="2013-06-15T10:00:00+8:00">Hello World. Good-bye
World.</target>
  </segment>
</unit>
<changeTrack>
  <revisions appliesTo="source" nid="#s1">
    <revision checksum="59DE4807" author="system" datetime="2013-05-01T10:00:00+8:00">
      <item property="content">Hello. Good-bye.</item>>
    </revision>
  </revisions>
</changeTrack>

Could get re-segmented to this:

<unit id="1">
  <segment id="s1">
    <source ctr:checksum="8C960132" ctr:author="system" ctr:datetime="2013-06-15T10:00:00+8:00">Hello World.</source>
    <target ctr:checksum="8C960132" ctr:author="system" ctr:datetime="2013-06-15T10:00:00+8:00">Hello World.</target>
  </segment>
  <segment id="s2">
    <source ctr:checksum="9A4EC1FF" ctr:author="system" ctr:datetime="2013-06-15T10:00:00+8:00">Good-bye World.</source>
    <target ctr:checksum="9A4EC1FF" ctr:author="system" ctr:datetime="2013-06-15T10:00:00+8:00">Good-bye World.</target>
  </segment>
</unit>
<changeTrack>
  <revisions appliesTo="source">
    <revision nid="#s1" checksum="5E894D8C" author="system" datetime="2013-06-15T10:00:00+8:00">
      <item property="content">Hello World. Good-bye World.</item>>
    </revision>
    <revision nid="#s1" checksum="59DE4807" author="system" datetime="2013-05-01T10:00:00+8:00">
      <item property="content">Hello. Good-bye.</item>>
    </revision>
  </revisions>
</changeTrack>

Now if I translate my target in the second segment to:

<target ctr:checksum="0B3DC22D" ctr:author="ryan@live.com" ctr:datetime="2013-06-21T10:00:00+8:00">Tschau Welt.</target>

My change tracking would look like this:

<changeTrack>
  <revisions appliesTo="source">
    <revision nid="#s1" checksum="5E894D8C" author="system" datetime="2013-06-21T10:00:00+8:00">
      <item property="content">Hello World. Good-bye World.</item>>
    </revision>
    <revision nid="#s1" checksum="59DE4807" author="system" datetime="2013-05-15T10:00:00+8:00">
      <item property="content">Hello. Good-bye.</item>>
    </revision>
  </revisions>
  <revisions appliesTo="target">
    <revision nid="#s2" checksum="9A4EC1FF" author="system" datetime="2013-06-15T10:00:00+8:00">
      <item property="content">Good-bye World.</item>>
    </revision>
  </revisions>
</changeTrack>

Thanks,
Ryan

-----Original Message-----
From: xliff@lists.oasis-open.org [mailto:xliff@lists.oasis-open.org] On Behalf Of Ryan King
Sent: Friday, June 21, 2013 11:32 AM
To: Schnabel, Bryan S; Estreen, Fredrik; Yves Savourel; 'XLIFF Main List'
Subject: RE: [xliff] Re-segmentation

Kevin mentioned to me that in the call on Tuesday, there were questions on whether we could just remove <val:validation> and
<ctr:changeTrack> from <segment> and if there was a use case that prevented that. Kevin and I have had some discussion and concluded
on our side that we can remove them from <segment> as long as we have some processing rules defined.

<val:validation>
Validators must recombine segments before applying validation rules to the <unit>. (Otherwise, I might have an individual segment
that will fail the rule.)

<ctr:changeTracking>
Modifiers should copy author and datetime attributes from the original segment to each new segment created through re-segmentation.
Checksums for each new segment should also be recalculated. Once segments have been modified by translation, if recombined, author
and datetime attributes from the most recently modified segment should be copied to the recombined segment and the checksum
recalculated.

I don't think there would be a reason to use <mrk> using the processing rules above. So, in conclusion, I think we can go ahead and
remove these modules from <segment>.

Ryan


-----Original Message-----
From: xliff@lists.oasis-open.org [mailto:xliff@lists.oasis-open.org] On Behalf Of Schnabel, Bryan S
Sent: Monday, June 17, 2013 9:32 AM
To: Estreen, Fredrik; Yves Savourel; 'XLIFF Main List'
Subject: RE: [xliff] Re-segmentation

Fredrik,

Thanks for catching this. I'll let the three of you contemplate the best way forward to overcome, or accept this limitation.

I'll comment on another aspect. While I hope somebody comes up with a new idea to counteract this, we could always say that under
this (hopefully) corner case, we offer an override clause. Maybe we say something like "when translating the new segment makes
positioning the <mrk> elements around appropriate subsets in target unfriendly to automation the agent may skip the re-segmentation,
or throw away the offended module."

Like I said, a better scenario is that somebody solves the use case.

Thanks,

Bryan

-----Original Message-----
From: xliff@lists.oasis-open.org [mailto:xliff@lists.oasis-open.org] On Behalf Of Estreen, Fredrik
Sent: Monday, June 17, 2013 1:10 AM
To: Yves Savourel; 'XLIFF Main List'
Subject: RE: [xliff] Re-segmentation

Hi Yves, Ryan,

After getting some more time to think about this I'm no longer convinced that using <mrk> to markup sections of text will work well
for many use cases where we also need to annotate <target> content. My fear is that it will be very hard to propagate markup from
source to target in automatic processing. And likewise it will be time consuming for the translator to do manually, driving up cost
of translation of such material.

Consider this example where we go from sub sentence segmentation to sentence segmentation:

<unit>
  <segment>
    <source><mrk id="1">Joe read the book,</mrk></source>
  </segment>
  <ignorable>
    <source> </source>
  </ignorable>
  <segment>
    <source><mrk id="2">but his friend saw the movie.</mrk></source>
  </segment>
</unit>

Is transformed into:

<unit>
  <segment>
    <source><mrk id="1">Joe read the book,</mrk> <mrk id="2">but his friend saw the movie.</mrk></source>
  </segment>
</unit>

When translating the new segment we need to somehow position the <mrk> elements around appropriate subsets in target. For some
things it might not matter too much where we put them for others it will be critical. For example a validation rule would need to be
around the target portion actually corresponding to the marked up source to be meaningful at all. For change tracking it might not
be functionally important (all translatable text is tracked anyway). But the value of the tracking information could be reduced if
it does not track the same semantic part of source and target. In the above example the coma might have gone missing in some
languages or a lower quality TM match complicating finding the right midpoint to use.

Here is a Swedish translation without the coma but with the <mrk>'s correctly placed. The coma should be present in Swedish
according to pure grammatical rules, but there is a shift away from that to a looser set of rules around general readability for
coma usage. So we assume a translator left it out. It is simple for a human to place the <mrk> correctly but takes extra time. For a
TM matching system it would be impossible without; semantic knowledge about source and target languages, <mrk>'s already in the TM
or additional sub segment matches.

<unit>
  <segment>
    <source><mrk id="1">Joe read the book,</mrk> <mrk id="2">but his friend saw the movie.</mrk></source>
    <target><mrk id="1">Joe läste boken</mrk> <mrk id="2">men hans kompis såg filmen.</mrk></target>
  </segment>
</unit>

If we allow markup that need to be linked between <source> and <target> at the segment level, moving the markup from <segment> to
<mrk> makes it technically possible to re-segment. But it would still be somewhere between hard and impossible in practice for
machine processes to get it right. Perhaps that is not a big issue and we would in those instances just rely on manual placement
after the automatic process, but this seem like going against the current trend of more doing automated processing.

Regards,
Fredrik Estreen

> -----Original Message-----
> From: xliff@lists.oasis-open.org [mailto:xliff@lists.oasis-open.org]
> On Behalf Of Yves Savourel
> Sent: den 13 juni 2013 07:15
> To: 'XLIFF Main List'
> Subject: RE: [xliff] Re-segmentation
>
> Hi Ryan, all,
>
> I'm trying to see any drawbacks to the proposal.
> As a transport/exchange format I don't see why this would not work.
>
> Thinking about import/export from/to a tool: I suppose some tools will 
> have to break down the unique marker into several if their internal 
> annotation model supports only one annotation per marker, so that may 
> make the code a bit more tricky (and for output too).
> But that is not a big issue.
>
> As long as it such representation is not a must but just a possible 
> notation that should be ok.
>
> So we would have to add an extra pre-define type of annotation for mrk:
> 'ref' or 'references'.
>
> The only issue I see is the redundancy with the normal ref attribute of mrk.
> When you have a single reference to place, what do you use?
>
> <mrk id='1' type='ctr:changeTrack' ref='#c1'> Or <mrk id='1' type='ref'
> ctr:changeTrackID="c1" >
>
> I would also use a name like ctr:ref rather than ctr:changeTrackID as 
> the attribute value is a reference to the ID of the block of info rather than an ID.
>
> Also: should the block of information have a reference to the marker?
> In the current proposal you have to be on the mrk to know where to get the info.
> But it's more complicated to know where is the marker from the block 
> of info (you can't use the ID mechanism since ctr:changeTrackID cannot 
> be both a reference and an ID (you would have duplicated ID values) 
> You can obviously always get to the mrk using XPath rather than the
> id() function, so maybe that is not an issue.
>
> Just thinking aloud...
> -ys
>
>
> -----Original Message-----
> From: Ryan King [mailto:ryanki@microsoft.com]
> Sent: Wednesday, June 12, 2013 4:32 PM
> To: Yves Savourel; XLIFF Main List
> Subject: RE: [xliff] Re-segmentation
>
> After our panel discussion today at the symposium and trying to 
> visualize this, I think we may be over-complicating the structure 
> using annotations to point to modules that contain segment-level 
> metadata. For example, here is what we have defined today in the
> spec:
>
> <unit>
>   <segment id="1">
>     <source>Hello World. Hello World 2.</source>
>     <target>Hello World. Hello World 2.</target>
>     <ctr:changeTrack>...</ctr:changeTrack>
>     <mda:metadata">...</mda:metadata>
>     <val:validation>...</val:validation>
>   </segment>
> </unit>
>
> And the same thing using annotations after re-segmenting in the way I 
> think we've been discussing it, where maybe the second segment needs 
> validation, but the first doesn't, but they both need metadata and 
> they both need change tracking.
>
> <unit>
>   <segment 1d="1">
>     <source><mrk id="1" type="changeTrack" ref="#c1"><mrk id="2"
> type="metadata" ref="#m1"><mrk id="3" type="validation"
> ref="#v1">Hello World.</mrk></mrk></mrk></source>
>     <target><mrk id="1" type="changeTrack" ref="#c1"><mrk id="2"
> type="metadata" ref="#m1"><mrk id="3" type="validation"
> ref="#v1">Hello World.</mrk></mrk></mrk></target>
>   </segment>
>   <segment id="2">
>     <source><mrk id="1" type="changeTrack" ref="#c2"><mrk id="2"
> type="metadata" ref="#m2">Hello World 2.</mrk></mrk></source>
>     <target><mrk id="1" type="changeTrack" ref="#c2"><mrk id="2"
> type="metadata" ref="#m2">Hello World 2.</mrk></mrk></target>
>   </segment>
>   <ctr:changeTrack id="c1">...</ctr:changeTrack>
>   <mda:metadata id="m1">...</mda:metadata>
>   <val:validation id="v1">...</val:validation>
>   <ctr:changeTrack id="c2">...</ctr:changeTrack>
>   <mda:metadata id="m2">...</mda:metadata>
>   <val:validation id="v3">...</val:validation> </unit>
>
> Right away, as Yves pointed out, that is a lot of <mrk> elements (and 
> there would potentially be more with matches, etc.) surrounding the 
> actual source and target text. Also, it is ambiguous, because it looks 
> like I have <mrk> elements embedded in other <mrk> elements and this 
> is technically not the case. Maybe it would make more sense to have 
> each module, or extension, with segment-level metadata, define an 
> attribute that could be used in a custom annotation for referencing.
> For example, something like a custom "reference" annotation:
>
> <unit>
>   <segment 1d="1">
>     <source><mrk id="1" type="reference" ctr:changeTrackID="c1"
> mda:metadataID="m1" val:validationID="v1" translate="yes">Hello 
> World</mrk></source>
>     <target><mrk id="1" type="reference" ctr:changeTrackID="c1"
> mda:metadataID="m1" val:validationID="v1" translate="yes">Hello 
> World</mrk></target>
>   </segment>
>   <segment id="2">
>     <source ><mrk id="2" type="reference" ctr:changeTrackID="c2"
> mda:metadataID="m2" translate="yes">Hello World 2</mrk><source>
>     <target><mrk id="1" type="reference" ctr:changeTrackID="c1"
> mda:metadataID="m1" translate="yes">Hello World</mrk></target>
>
>   </segment>
>   <ctr:changeTrack id="c1">...</ctr:changeTrack>
>   <mda:metadata id="m1">...</mda:metadata>
>   <val:validation id="v1">...</val:validation>
>   <ctr:changeTrack id="c2">...</ctr:changeTrack>
>   <mda:metadata id="m2">...</mda:metadata>
>   <val:validation id="v3">...</val:validation> </unit>
>
> What do you think?
>
> Ryan
>
> -----Original Message-----
> From: xliff@lists.oasis-open.org [mailto:xliff@lists.oasis-open.org]
> On Behalf Of Yves Savourel
> Sent: Wednesday, June 12, 2013 5:48 AM
> To: XLIFF Main List
> Subject: [xliff] Re-segmentation
>
> Hi all,
>
> Thinking more about the different solutions for re-segmentation in 
> 2.0, especially about solution #4:
>
> - We would have to define PRs for the <segment> attributes like 
> translate, approved, state, etc.
> Note that translate would logically become a <mrk translate='yes|no'>.
> Is that mean we should always have this info as an <mrk>?
>
> - We would have to add an id in all top elements like <matches>, 
> <changeTrack> and allow multiple of them at the <unit> level.
>
> - The part that concerns me most is the paradigm shift for developers.
> Traditionally many tools are segment-based and with solution
> #4 they would have to change how many metadata for the segments would 
> be stored, and decide what to do with the parts that don't correspond 
> to a segment anymore (overlapping <mrk>s and sub-segment <mrk>).
>
> - We may end up with <segment> containing a lot of <mrk> at both ends.
> It may take some efforts to deal with those. They may have some side 
> effects on functions like TM matching, etc.
>
> I'm still relatively sure that #4 is probably the better 
> representation on the long-term, but it is a very big change. So the 
> more feedback before we go that way the better. And we really need 
> examples and working implementation for this.
>
> Cheers,
> -yves
>
>
> ---------------------------------------------------------------------
> To unsubscribe from this mail list, you must leave the OASIS TC that 
> generates this mail.  Follow this link to all your TCs in OASIS
> at:
> https://www.oasis-
> open.org/apps/org/workgroup/portal/my_workgroups.php
>
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe from this mail list, you must leave the OASIS TC that 
> generates this mail.  Follow this link to all your TCs in OASIS at:
> https://www.oasis-
> open.org/apps/org/workgroup/portal/my_workgroups.php
>



---------------------------------------------------------------------
To unsubscribe from this mail list, you must leave the OASIS TC that generates this mail.  Follow this link to all your TCs in OASIS
at:
https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php




---------------------------------------------------------------------
To unsubscribe from this mail list, you must leave the OASIS TC that generates this mail.  Follow this link to all your TCs in OASIS
at:
https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php





---------------------------------------------------------------------
To unsubscribe from this mail list, you must leave the OASIS TC that generates this mail.  Follow this link to all your TCs in OASIS
at:
https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php





---------------------------------------------------------------------
To unsubscribe from this mail list, you must leave the OASIS TC that generates this mail.  Follow this link to all your TCs in OASIS
at:
https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php




[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]