Re: [xliff-comment] Re: [xliff] Version Control Commit by DavidFilip

Subject: Re: [xliff-comment] Re: [xliff] Version Control Commit by DavidFilip

Hi Yves,

thanks for this, this is really helpful. I am too worried about the CTRM becoming too complicated. As I said in the last meeting this is intended as a strawman that puts forward an array of possible solutions, happy to restrict them along the lines you suggested..

I introduced the simpleItem in the latest draft and that was inspired by the use cases as described by Chase in the XLIFF OMOS call last week..

I think that the biggest issue the design needs to solve is that CTRM doesn't have control over the XLIFF Core content, therefore storing small changes with small footprint is always in danger of becoming useless when the core is modified by an Agent unaware of CTRM..

So I heard two conflicting requirements:

1) Let me store small changes with a small footprint, I don't want to store the whole segment if I made a tiny text change or similar

2) Don't let things get stored in different ways in CTRM

I am reacting to some specifics of you feedback inline below..

On Monday, I will create another version of the CTRM 2.1 proposal based on this feedback and the reflections I expressed here and inline below..

Cheers and thanks

Dr. David Filip

===========

OASIS XLIFF OMOS TC Chair

OASIS XLIFF TC Secretary, Editor, Liaison Officer

Spokes Research Fellow

ADAPT Centre

KDEG, Trinity College Dublin

Mobile: +420-777-218-122

On Sun, Oct 2, 2016 at 3:54 AM, Yves <yves@opentag.com> wrote:

Hi David, all,

It seems to me the CTR module is becoming rather complicated.
Here are a few things I've noted after a quick look at the latest draft:

=== Issues ===

--- Do we really want to allow agents to track the content of individual inline codes?
That can be done by tracking the parent's content itself (in a much simpler way).

dF: As I said above I am happy not to track individual inlines, but tracking structural parents will bring lot of unwanted redundancy

--- You can have revisions that apply to different elements but be inconsistent:

For example, you can have:

<revisions appliesTo="target">
...
</revisions>
<revisions appliesTo="segment">
...
</revisions>

With both tracking the same content (since segment is a superset of target), but there are no way to safely make sense of their history (e.g. datetime is optional so one may not now the order of the changes, the currentVersion of each could be contradictory, etc.) It'd be impossible to really use across different tools, which make the existence of a common module pointless.
The same issue arises with revisions on specific inline codes along with revisions on the source/target content.

One of the things we wanted to achieve with XLIFF 2.x is avoid having different ways to do the same thing. CTR2.1 has many ways to do the same thing.

--- Currently a <revision> can have more than 1 <item> with the same property value. Which means you can have N different changes for the same data at the same time.

Good catch, happy to put a uniqueness requirement (Constraint) on that.

--- I'm unsure how attributes work in the case of tracking a segment/ignorable content.

For example, you may have a revision of the state attribute of the segment s1, but have also an item tracking the segments for the unit where that segment s1 is in; and that segment may have a different state. When a tool looks for the history of the state values for s1 what does it do? Look just at the revisions for appliesTo='target' + property='state' or also take into account the state attribute in the item for appliesTo='segment' + property='content'?

--- Having <originalData> inside <item> seems a bad idea: the content of <item> should be same content as <source>/<target>/<segment/etc. It should probably be at the <revision> level.

Ok, to have them at <revision>

I thought that the highest possible level of item was the same as unit, but you're right that <item> is complicated enough w/o original data..

=== Thoughts ===

Some ideas:

- Make datetime a required attribute. A history without date/time is a lot less useful, and datetime is easy to set for any tool.

+1 to that

- Get rid of currentVersion: If "the most current version of a revision" means "the latest", then a required datetime takes care of this, without having to maintain an extra attribute (and a bunch of PRs). Or maybe I'm missing the point of this attribute.

Fine with me, little value in those extra PRs and the REQUIRED datetime attribute is more reliable

- Let's not allow to track individual inline codes. I don't think anyone has made that requirement. It would also make things complicated for interoperability since you would have different ways to track them (individually or in content).

Again, fine with me, I just want to highlight that the current draft version is a restriction compared with the 2.0 ctr where you can track ANY XLIFF defined element

I am happy to take out the individual codes from the enumeration of the trackable elements..

Again, I just want to make everyone aware that it will require larger portions of text stored as revisions even in case of minor changes..

- Add a constraint saying the property values of <item> must be unique within a given <revision>.

- It seems we are having too many ways to track the same thing. And/or we try to do too much.

The only obstacle, as far as know, that prevent us to use inline in <item> is that we say <em/> must have its corresponding <sm/>.
But maybe we've been focusing too much (once again) on the XLIFF markup.

The more general issue with content in CTR entries is that we take it out of context (from a markup viewpoint). But the actual data after parsing is what we really need to store. So, if we have this:

<target>...data<em startRef='m1'/> text</target>

We can store it like this:

<item property='content'><mrk id='m1' translate='no'>...data</mrk> text<item>

I don't think this is a good idea, the stuff between start of <target> and the unhandled isolated <em/> is actually text too and not untranslatable data. Remember we store data in <originalData> and we don't allow to store mix them with inline content..

The other aspect of CTR is that I don't think we can expect all the constraints the normal unit content has to apply in the CTR elements. We will have duplicate ID values, etc.

I think you can have unit like identity constraints on item and probably within each <revision> elements. As discussed before, the module uniqueness scope is separate from the core <unit> uniqueness scope.. Not sure which of the above you mean?

- As far as the different types of <item> content: I'm not sure we need all the possibilities the draft currently has. Do we have requirements for all of them?

I think that theere are only two clear requirements

Tracking simple target revisions

and note changes

I am sure there would be value in tracking of segmentation changes

The Ocelot requirements seem to be informed by the XLIFF 1.2 usage of <alt-trans> for <trans-unit><target> changes

In 1.2 the <alt-trans> has two allowed data types, the full <trans-unit> and <target> only.

Since <unit> is our logical unit in 2.0, and its data model is more complicated than <trans-unit> because it handles segmentation too. I think we don't really have an option.

I tried to cater for simple needs with the simplified <simpleItem> that would basically only allow for one type of the 4 types of <item> content.

Cheers,
-yves

xliff-comment message