xliff message

Subject: RE: [xliff] Fragment Identification

From: "David.O'Carroll" <David.OCarroll@ul.ie>
To: "Yves Savourel" <ysavourel@enlaso.com>, <xliff@lists.oasis-open.org>
Date: Mon, 16 Dec 2013 14:34:35 -0000

Title: RE: [xliff] Fragment Identification

Hi Yves, all,

> Thanks for the thoughts on the different options.

No problem.

> -   Any suggestions for modules/extensions?

For modules they should define their own two to five character prefix (as David suggested) to be used for references. The prefixes should be registered with the TC to avoid conflicts. I don't see a simple way of doing this dynamically. As Yves suggested previously, you could put the prefix on the file element and use that on all elements within that file. That would work but is very messy and difficult to parse.

> -   Just a reminder so that we don't lose track of it: The difference between David's proposal and the others is not just syntactic:
> we would also lose the separation of id scope between units and groups, which in my opinion is a bad thing.

That is true, having groups and units share their scope leads to shorter references (as ids on groups are irrelevant since the units are guarenteed to be unique within the given file). On the other hand, you must ensure each unit in a given file has a unique id for all groups in that file. What are your objections to this?

> -   Identifiers of <file>: we need to decide once for all if joining XLIFF documents is OK or not (it's OK (and done) in 1.2). If it
> is also OK in 2.0 (so far nothing says it is not) then we need to define how it can be done while keeping the <file> identifier
> unique.

I agree, it needs to be decided. UUIDs for file ids will definitely allow the merging of XLIFF files with a simple implementation.

Regards,
Dave

From: xliff@lists.oasis-open.org [mailto:xliff@lists.oasis-open.org] On Behalf Of David.O'Carroll
Sent: Monday, December 16, 2013 6:14 AM
To: xliff@lists.oasis-open.org
Subject: [xliff] Fragment Identification

Hi all,

I have been looking into the fragment identification proposals made by David, Yves and Fredrik (original subject was "Comments on
Fragment Identification"). As I see it there are two ways to go. We can either have David's solution where references can be local
to the current unit or absolute from the file level or use prefixes as Yves and Fredrik suggested.

For prefixes I would use the following scheme:

(From Fredrik's proposal)
IRI format:
scope separator - '/'
prefix separator - '='     (as Yves suggested)
prefix - NMTOKEN
id - NMTOKEN
selector - prefix=id
path - #[/}?selector[/selector]*

(Again from Fredrik's proposal)
Scopes:
<file>, prefix 'f', unique within document
<group>, prefix 'g', unique within <file>
<unit>, prefix 'u', unique within <file>
<note>, prefix 'n', unique within parent <file>,<group> or <unit>. Ie one scope per parent container
Inline tags in target, prefix 't', unique within its enclosing <unit>
Inline tags in source, no prefix, unique within its enclosing <unit>       (as Yves suggested)

(Fredrik's examples modified to match above changes)
Examples:
An absolute reference to note "5" in file "foo.xml" and group "div12":
#/f=foo.xml/g=div12/n=5".
A relative reference from an inline element to unit 5 in the same file: "#u=5"
A reference from within a unit to note 10 in group 7: "#g=7/n=10"
A reference to an inline source <ph> tag with id 1 from the same unit: "#1"
A reference to unit p40 in file foo.xml from outside the document:
"#/f=foo.xml/u=p40"

Below are the same examples using David's implementation:
An absolute reference to note "5" in file "foo.xml" and group "div12":
#foo.xml~div12~5".
A relative reference from an inline element to unit 5 in the same file:
Relative paths are not allowed in David's scheme (unless local to current unit)
A reference from within a unit to note 10 in group 7: "#foo.xml~7~10"
A reference to an inline source <ph> tag with id 1 from the same unit: "#1" (local references to source look the same as above)
A reference to unit p40 in file foo.xml from outside the document:
"#foo.xml~p40"

The consequences of each proposal with respect to the quality/functional requirements identified in Fredrik's email:

We generally want IRIs:
* that are short
- For local referencing there is no difference between the two proposals (except for the prefix on target references)
- The prefix based proposal can produce relative paths which are shorter than David's abolute references but it is not something
we would like to encourage (there should not be dependencies between units/files)
- Due to the lack of prefixes David's proposal produces the shortest absolute references but is less readable as a result.
* that are descriptive enough to identify what they refer to (hopefully also by humans)
- As far as I can see both proposals are expressive enough to uniquely identify any element within a XLIFF document but David's
proposal is less human readable (see above)
* that limit what parts of a document need to be parsed / checked / remembered when following them
- As I can see it both proposals require the full XLIFF document to be stored in memory while being parsed. For both schemes there
is no way to know if there is a reference to another file in the XLIFF document on inline elements.
* that depend on ID scopes that are suitable for stream processing when creating new elements
- I don't see any difference between the two proposals with respect to stream processing
* that are able to refer to all core constructs that makes sense
- Again, both proposals look expressive enough to uniquely identify any core constructs

I would suggest some changes to David's proposal. For the scope seperator I would suggest "/" instead of "~" as it seems more
intuitive (used in XPath). For prefixes I would change from \{prefix} to {prefix}= as it seems to make more sense (as Yves said:
"#u=123" says clearly "the unit with an id equals to 123").

Using UUIDs to add more file elements to an existing XLIFF document is a processing requirement and seems out of scope for the spec.
Having said that, it seems like a small change that enables quite a powerfull operation (e.g. build a corpus of XLIFF files in a
single XLIFF document). Changes required would include defining a new attribute for file to hold its unique requirements. There is
also an issue with generating UUIDs in different programming languages as not all languages support UUID generation so third party
tools would be required in some cases.

I may have overlooked some things while writing this up so if there is anything I missed feedback would be greatly appreciated.

Regards,
Dave

Follow-Ups:
- RE: [xliff] Fragment Identification
  - From: Yves Savourel <ysavourel@enlaso.com>

References:
- Fragment Identification
  - From: "David.O'Carroll" <David.OCarroll@ul.ie>
- RE: [xliff] Fragment Identification
  - From: Yves Savourel <ysavourel@enlaso.com>