OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

xliff message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: RE: [xliff] Re-segmentation


Hi Ryan, all,

I'm trying to see any drawbacks to the proposal.
As a transport/exchange format I don't see why this would not work.

Thinking about import/export from/to a tool: I suppose some tools will have to break down the unique marker into several if their
internal annotation model supports only one annotation per marker, so that may make the code a bit more tricky (and for output too).
But that is not a big issue.

As long as it such representation is not a must but just a possible notation that should be ok.

So we would have to add an extra pre-define type of annotation for mrk: 'ref' or 'references'.

The only issue I see is the redundancy with the normal ref attribute of mrk. When you have a single reference to place, what do you
use?

<mrk id='1' type='ctr:changeTrack' ref='#c1'>
Or <mrk id='1' type='ref' ctr:changeTrackID="c1" >

I would also use a name like ctr:ref rather than ctr:changeTrackID as the attribute value is a reference to the ID of the block of
info rather than an ID.

Also: should the block of information have a reference to the marker? In the current proposal you have to be on the mrk to know
where to get the info. But it's more complicated to know where is the marker from the block of info (you can't use the ID mechanism
since ctr:changeTrackID cannot be both a reference and an ID (you would have duplicated ID values)
You can obviously always get to the mrk using XPath rather than the id() function, so maybe that is not an issue.

Just thinking aloud...
-ys


-----Original Message-----
From: Ryan King [mailto:ryanki@microsoft.com] 
Sent: Wednesday, June 12, 2013 4:32 PM
To: Yves Savourel; XLIFF Main List
Subject: RE: [xliff] Re-segmentation

After our panel discussion today at the symposium and trying to visualize this, I think we may be over-complicating the structure
using annotations to point to modules that contain segment-level metadata. For example, here is what we have defined today in the
spec:

<unit>
  <segment id="1">
    <source>Hello World. Hello World 2.</source>
    <target>Hello World. Hello World 2.</target>
    <ctr:changeTrack>...</ctr:changeTrack>
    <mda:metadata">...</mda:metadata>
    <val:validation>...</val:validation>
  </segment>
</unit>

And the same thing using annotations after re-segmenting in the way I think we've been discussing it, where maybe the second segment
needs validation, but the first doesn't, but they both need metadata and they both need change tracking.

<unit>
  <segment 1d="1">
    <source><mrk id="1" type="changeTrack" ref="#c1"><mrk id="2" type="metadata" ref="#m1"><mrk id="3" type="validation"
ref="#v1">Hello World.</mrk></mrk></mrk></source>	
    <target><mrk id="1" type="changeTrack" ref="#c1"><mrk id="2" type="metadata" ref="#m1"><mrk id="3" type="validation"
ref="#v1">Hello World.</mrk></mrk></mrk></target>
  </segment>	
  <segment id="2">
    <source><mrk id="1" type="changeTrack" ref="#c2"><mrk id="2" type="metadata" ref="#m2">Hello World 2.</mrk></mrk></source>	
    <target><mrk id="1" type="changeTrack" ref="#c2"><mrk id="2" type="metadata" ref="#m2">Hello World 2.</mrk></mrk></target>	
  </segment>
  <ctr:changeTrack id="c1">...</ctr:changeTrack>
  <mda:metadata id="m1">...</mda:metadata>
  <val:validation id="v1">...</val:validation>
  <ctr:changeTrack id="c2">...</ctr:changeTrack>
  <mda:metadata id="m2">...</mda:metadata>
  <val:validation id="v3">...</val:validation> </unit>

Right away, as Yves pointed out, that is a lot of <mrk> elements (and there would potentially be more with matches, etc.)
surrounding the actual source and target text. Also, it is ambiguous, because it looks like I have <mrk> elements embedded in other
<mrk> elements and this is technically not the case. Maybe it would make more sense to have each module, or extension, with
segment-level metadata, define an attribute that could be used in a custom annotation for referencing. For example, something like a
custom "reference" annotation:

<unit>
  <segment 1d="1">
    <source><mrk id="1" type="reference" ctr:changeTrackID="c1" mda:metadataID="m1" val:validationID="v1" translate="yes">Hello
World</mrk></source>	
    <target><mrk id="1" type="reference" ctr:changeTrackID="c1" mda:metadataID="m1" val:validationID="v1" translate="yes">Hello
World</mrk></target>	
  </segment>	
  <segment id="2">
    <source ><mrk id="2" type="reference" ctr:changeTrackID="c2" mda:metadataID="m2" translate="yes">Hello World 2</mrk><source>
    <target><mrk id="1" type="reference" ctr:changeTrackID="c1" mda:metadataID="m1" translate="yes">Hello World</mrk></target>

  </segment>
  <ctr:changeTrack id="c1">...</ctr:changeTrack>
  <mda:metadata id="m1">...</mda:metadata>
  <val:validation id="v1">...</val:validation>
  <ctr:changeTrack id="c2">...</ctr:changeTrack>
  <mda:metadata id="m2">...</mda:metadata>
  <val:validation id="v3">...</val:validation> </unit>

What do you think?

Ryan

-----Original Message-----
From: xliff@lists.oasis-open.org [mailto:xliff@lists.oasis-open.org] On Behalf Of Yves Savourel
Sent: Wednesday, June 12, 2013 5:48 AM
To: XLIFF Main List
Subject: [xliff] Re-segmentation

Hi all,

Thinking more about the different solutions for re-segmentation in 2.0, especially about solution #4:

- We would have to define PRs for the <segment> attributes like translate, approved, state, etc.
Note that translate would logically become a <mrk translate='yes|no'>. Is that mean we should always have this info as an <mrk>?

- We would have to add an id in all top elements like <matches>, <changeTrack> and allow multiple of them at the <unit> level.

- The part that concerns me most is the paradigm shift for developers. Traditionally many tools are segment-based and with solution
#4 they would have to change how many metadata for the segments would be stored, and decide what to do with the parts that don't
correspond to a segment anymore (overlapping <mrk>s and sub-segment <mrk>).

- We may end up with <segment> containing a lot of <mrk> at both ends. It may take some efforts to deal with those. They may have
some side effects on functions like TM matching, etc.

I'm still relatively sure that #4 is probably the better representation on the long-term, but it is a very big change. So the more
feedback before we go that way the better. And we really need examples and working implementation for this.

Cheers,
-yves


---------------------------------------------------------------------
To unsubscribe from this mail list, you must leave the OASIS TC that generates this mail.  Follow this link to all your TCs in OASIS
at:
https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php 







[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]