[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: RE: [xliff] Re-segmentation
Hi Ryan, all, I'm trying to see any drawbacks to the proposal. As a transport/exchange format I don't see why this would not work. Thinking about import/export from/to a tool: I suppose some tools will have to break down the unique marker into several if their internal annotation model supports only one annotation per marker, so that may make the code a bit more tricky (and for output too). But that is not a big issue. As long as it such representation is not a must but just a possible notation that should be ok. So we would have to add an extra pre-define type of annotation for mrk: 'ref' or 'references'. The only issue I see is the redundancy with the normal ref attribute of mrk. When you have a single reference to place, what do you use? <mrk id='1' type='ctr:changeTrack' ref='#c1'> Or <mrk id='1' type='ref' ctr:changeTrackID="c1" > I would also use a name like ctr:ref rather than ctr:changeTrackID as the attribute value is a reference to the ID of the block of info rather than an ID. Also: should the block of information have a reference to the marker? In the current proposal you have to be on the mrk to know where to get the info. But it's more complicated to know where is the marker from the block of info (you can't use the ID mechanism since ctr:changeTrackID cannot be both a reference and an ID (you would have duplicated ID values) You can obviously always get to the mrk using XPath rather than the id() function, so maybe that is not an issue. Just thinking aloud... -ys -----Original Message----- From: Ryan King [mailto:ryanki@microsoft.com] Sent: Wednesday, June 12, 2013 4:32 PM To: Yves Savourel; XLIFF Main List Subject: RE: [xliff] Re-segmentation After our panel discussion today at the symposium and trying to visualize this, I think we may be over-complicating the structure using annotations to point to modules that contain segment-level metadata. For example, here is what we have defined today in the spec: <unit> <segment id="1"> <source>Hello World. Hello World 2.</source> <target>Hello World. Hello World 2.</target> <ctr:changeTrack>...</ctr:changeTrack> <mda:metadata">...</mda:metadata> <val:validation>...</val:validation> </segment> </unit> And the same thing using annotations after re-segmenting in the way I think we've been discussing it, where maybe the second segment needs validation, but the first doesn't, but they both need metadata and they both need change tracking. <unit> <segment 1d="1"> <source><mrk id="1" type="changeTrack" ref="#c1"><mrk id="2" type="metadata" ref="#m1"><mrk id="3" type="validation" ref="#v1">Hello World.</mrk></mrk></mrk></source> <target><mrk id="1" type="changeTrack" ref="#c1"><mrk id="2" type="metadata" ref="#m1"><mrk id="3" type="validation" ref="#v1">Hello World.</mrk></mrk></mrk></target> </segment> <segment id="2"> <source><mrk id="1" type="changeTrack" ref="#c2"><mrk id="2" type="metadata" ref="#m2">Hello World 2.</mrk></mrk></source> <target><mrk id="1" type="changeTrack" ref="#c2"><mrk id="2" type="metadata" ref="#m2">Hello World 2.</mrk></mrk></target> </segment> <ctr:changeTrack id="c1">...</ctr:changeTrack> <mda:metadata id="m1">...</mda:metadata> <val:validation id="v1">...</val:validation> <ctr:changeTrack id="c2">...</ctr:changeTrack> <mda:metadata id="m2">...</mda:metadata> <val:validation id="v3">...</val:validation> </unit> Right away, as Yves pointed out, that is a lot of <mrk> elements (and there would potentially be more with matches, etc.) surrounding the actual source and target text. Also, it is ambiguous, because it looks like I have <mrk> elements embedded in other <mrk> elements and this is technically not the case. Maybe it would make more sense to have each module, or extension, with segment-level metadata, define an attribute that could be used in a custom annotation for referencing. For example, something like a custom "reference" annotation: <unit> <segment 1d="1"> <source><mrk id="1" type="reference" ctr:changeTrackID="c1" mda:metadataID="m1" val:validationID="v1" translate="yes">Hello World</mrk></source> <target><mrk id="1" type="reference" ctr:changeTrackID="c1" mda:metadataID="m1" val:validationID="v1" translate="yes">Hello World</mrk></target> </segment> <segment id="2"> <source ><mrk id="2" type="reference" ctr:changeTrackID="c2" mda:metadataID="m2" translate="yes">Hello World 2</mrk><source> <target><mrk id="1" type="reference" ctr:changeTrackID="c1" mda:metadataID="m1" translate="yes">Hello World</mrk></target> </segment> <ctr:changeTrack id="c1">...</ctr:changeTrack> <mda:metadata id="m1">...</mda:metadata> <val:validation id="v1">...</val:validation> <ctr:changeTrack id="c2">...</ctr:changeTrack> <mda:metadata id="m2">...</mda:metadata> <val:validation id="v3">...</val:validation> </unit> What do you think? Ryan -----Original Message----- From: xliff@lists.oasis-open.org [mailto:xliff@lists.oasis-open.org] On Behalf Of Yves Savourel Sent: Wednesday, June 12, 2013 5:48 AM To: XLIFF Main List Subject: [xliff] Re-segmentation Hi all, Thinking more about the different solutions for re-segmentation in 2.0, especially about solution #4: - We would have to define PRs for the <segment> attributes like translate, approved, state, etc. Note that translate would logically become a <mrk translate='yes|no'>. Is that mean we should always have this info as an <mrk>? - We would have to add an id in all top elements like <matches>, <changeTrack> and allow multiple of them at the <unit> level. - The part that concerns me most is the paradigm shift for developers. Traditionally many tools are segment-based and with solution #4 they would have to change how many metadata for the segments would be stored, and decide what to do with the parts that don't correspond to a segment anymore (overlapping <mrk>s and sub-segment <mrk>). - We may end up with <segment> containing a lot of <mrk> at both ends. It may take some efforts to deal with those. They may have some side effects on functions like TM matching, etc. I'm still relatively sure that #4 is probably the better representation on the long-term, but it is a very big change. So the more feedback before we go that way the better. And we really need examples and working implementation for this. Cheers, -yves --------------------------------------------------------------------- To unsubscribe from this mail list, you must leave the OASIS TC that generates this mail. Follow this link to all your TCs in OASIS at: https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]