xliff message

Subject: RE: [xliff] Comments on Fragment Identification

From: "Estreen, Fredrik" <Fredrik.Estreen@lionbridge.com>
To: Yves Savourel <ysavourel@enlaso.com>, "xliff@lists.oasis-open.org" <xliff@lists.oasis-open.org>
Date: Wed, 4 Dec 2013 14:00:36 +0000

Hi Yves, David, All,

Here is my take on the fragment identification issue after the informal discussions that happened after the last TC call.

We generally want IRIs:
* that are short
* that are descriptive enough to identify what they refer to (hopefully also by humans)
* that limit what parts of a document need to be parsed / checked / remembered when following them
* that depend on ID scopes that are suitable for stream processing when creating new elements
* that are able to refer to all core constructs that makes sense

I think that using a type / scope prefix plus an ID is probably the best solution. Personally I want to avoid involving any other elements when manipulating inline elements as that is already one of the more complex tasks done during translation. Some elements are only created during initial creation of the XLIFF document or <file> and for these it is simple to use ID scopes that span large areas. These include <file>, <group> and <unit>. Other elements might be added during processing such as <note> and many modules, for these smaller scopes make processing much easier as you only need to look at a smaller and hopefully already known subset of nodes when you create a new one. To make relative URI within the document more compact we should adopt a context relative referencing scheme.

IRI format:
scope separator - '/'
prefix separator - '~'
prefix - NMTOKEN
id - NMTOKEN
selector - prefix~id
path - [/}?selector[/selector]*

Scopes:
<file>, prefix 'f', unique within document
<group>, prefix 'g', unique within <file>
<unit>, prefix 'u', unique within <file>, separate from <group> to keep references shorter.
<note>, prefix 'n', unique within parent <file>,<group> or <unit>. Ie one scope per parent container
<originalData>, prefix 'od', unique within its parent <unit>
Inline tags in source, prefix 'is', unique within its enclosing <unit>
Inline tags in target, prefix 'it', unique within its enclosing <unit>
Inline tags, prefix 'I', not unique may match in both source and target. Not sure if we really want this, feels like it could be useful.

Context relative lookup:
To keep internal references short any path scope not specified is implicitly set to the innermost enclosing scope. So for example a reference to a note from an inline <mrk> would implicitly refer to a not in the enclosing <file>, <unit> and if present the enclosing <units> enclosing <group>. So the IRI would in this case be just 'n~12' for example. If the IRI fragment starts with a '/' the scope becomes the document root.

Examples:
An absolute reference to note "5" in file "foo.xml" and group "div12": /f~foo.xml/g~div12/n~5". 
A relative reference from an inline element to unit 5 in the same file: "u~5"
A reference from within a unit to note 10 in group 7: "g~7/n~10"
A reference to an inline source <ph> tag with id 1 from the same unit: "s~1"
A reference to unit p40 in file foo.xml from outside the document: "/f~foo.xml/u~p40"

The proposed scheme would allow referring to any interesting core element using at most three levels of scope: <file>, <group> or <unit>, "leaf". I'm not wedded to the exact syntax there are pros and cons regarding what character to use depending on what is allowed in URIs and XML schema types. There might very well be better options.

With this scheme adding a note only require you to look at the parent container that will contain the note. Not all ancestors and decendants.

The proposed scheme would not be an obstacle to merging multiple XLIFF documents into one bigger although that is no defined in the standard.

One question I can see is why not use XPATH directly instead of a similar own scheme. I think the proposed scheme is fairly simple to implement and avoids having to evaluated potentially complex XPATH expressions. If we were to go with XPATH it would not make sense to define our own restricted subset.

Another slightly unrelated question I have is why we do not allow <originalData> on <file> and <group>. In many cases that could keep the amount of repeated data down. I think it was discussed before but I don't remember why we decided to not allow it.

Regards,
Fredrik Estreen

> -----Original Message-----
> From: xliff@lists.oasis-open.org [mailto:xliff@lists.oasis-open.org] On Behalf
> Of Yves Savourel
> Sent: den 3 december 2013 04:49
> To: xliff@lists.oasis-open.org
> Subject: RE: [xliff] Comments on Fragment Identification
> 
> Hi David, all,
> 
> Your updated proposal has still the same fundamental issue in my opinion:
> It achieves shorter fragment identification by sacrificing ID scopes.
> 
> The more data types an ID scope includes the more difficult is will be for
> applications to implement it. For example: There is absolutely no reason for a
> CAT tool to have to look-up all the IDs used in inline codes and annotations to
> pick the IDs of the original data elements, or to look-up units Ids to pick an ID
> for a group. They live in different domains.
> 
> Yet, with your proposal, we force the applications to un-natural Id scopes just
> because we are using an IRI fragment notation that requires all elements
> under <unit> to share the same ID scope.
> 
> This type of XLIFF-induced restrictions should be done only if there are no
> alternative. And in this case there is.
> 
> Cheers,
> -yves
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe from this mail list, you must leave the OASIS TC that
> generates this mail.  Follow this link to all your TCs in OASIS at:
> https://www.oasis-
> open.org/apps/org/workgroup/portal/my_workgroups.php
>

Follow-Ups:
- RE: [xliff] Comments on Fragment Identification
  - From: Yves Savourel <ysavourel@enlaso.com>

References:
- Comments on Fragment Identification
  - From: Yves Savourel <ysavourel@enlaso.com>
- Re: [xliff] Comments on Fragment Identification
  - From: "Dr. David Filip" <David.Filip@ul.ie>
- RE: [xliff] Comments on Fragment Identification
  - From: Yves Savourel <ysavourel@enlaso.com>
- Re: [xliff] Comments on Fragment Identification
  - From: "Dr. David Filip" <David.Filip@ul.ie>
- RE: [xliff] Comments on Fragment Identification
  - From: Yves Savourel <ysavourel@enlaso.com>