RE: [xliff] Fragment Identification

Hi Dave,

Thanks for the thoughts on the different options.

A few notes:

- Any suggestions for modules/extensions?

- Just a reminder so that we don’t lose track of it: The difference between David’s proposal and the others is not just syntactic: we would also lose the separation of id scope between units and groups, which in my opinion is a bad thing.

- Identifiers of <file>: we need to decide once for all if joining XLIFF documents is OK or not (it’s OK (and done) in 1.2). If it is also OK in 2.0 (so far nothing says it is not) then we need to define how it can be done while keeping the <file> identifier unique.

Cheers,

-yves

From: xliff@lists.oasis-open.org [mailto:xliff@lists.oasis-open.org] On Behalf Of David.O'Carroll
Sent: Monday, December 16, 2013 6:14 AM
To: xliff@lists.oasis-open.org
Subject: [xliff] Fragment Identification

Hi all,

I have been looking into the fragment identification proposals made by David, Yves and Fredrik (original subject was "Comments on Fragment Identification"). As I see it there are two ways to go. We can either have David's solution where references can be local to the current unit or absolute from the file level or use prefixes as Yves and Fredrik suggested.

For prefixes I would use the following scheme:

(From Fredrik's proposal)
IRI format:
scope separator - '/'
prefix separator - '=' (as Yves suggested)
prefix - NMTOKEN
id - NMTOKEN
selector - prefix=id
path - #[/}?selector[/selector]*

(Again from Fredrik's proposal)
Scopes:
<file>, prefix 'f', unique within document
<group>, prefix 'g', unique within <file>
<unit>, prefix 'u', unique within <file>
<note>, prefix 'n', unique within parent <file>,<group> or <unit>. Ie one scope per parent container
Inline tags in target, prefix 't', unique within its enclosing <unit>
Inline tags in source, no prefix, unique within its enclosing <unit> (as Yves suggested)

(Fredrik's examples modified to match above changes)
Examples:
An absolute reference to note "5" in file "foo.xml" and group "div12":
#/f=foo.xml/g=div12/n=5".
A relative reference from an inline element to unit 5 in the same file: "#u=5"
A reference from within a unit to note 10 in group 7: "#g=7/n=10"
A reference to an inline source <ph> tag with id 1 from the same unit: "#1"
A reference to unit p40 in file foo.xml from outside the document:
"#/f=foo.xml/u=p40"

Below are the same examples using David's implementation:
An absolute reference to note "5" in file "foo.xml" and group "div12":
#foo.xml~div12~5".
A relative reference from an inline element to unit 5 in the same file:
Relative paths are not allowed in David's scheme (unless local to current unit)
A reference from within a unit to note 10 in group 7: "#foo.xml~7~10"
A reference to an inline source <ph> tag with id 1 from the same unit: "#1" (local references to source look the same as above)
A reference to unit p40 in file foo.xml from outside the document:
"#foo.xml~p40"

The consequences of each proposal with respect to the quality/functional requirements identified in Fredrik's email:

We generally want IRIs:
* that are short
- For local referencing there is no difference between the two proposals (except for the prefix on target references)
- The prefix based proposal can produce relative paths which are shorter than David's abolute references but it is not something we would like to encourage (there should not be dependencies between units/files)
- Due to the lack of prefixes David's proposal produces the shortest absolute references but is less readable as a result.
* that are descriptive enough to identify what they refer to (hopefully also by humans)
- As far as I can see both proposals are expressive enough to uniquely identify any element within a XLIFF document but David's proposal is less human readable (see above)
* that limit what parts of a document need to be parsed / checked / remembered when following them
- As I can see it both proposals require the full XLIFF document to be stored in memory while being parsed. For both schemes there is no way to know if there is a reference to another file in the XLIFF document on inline elements.
* that depend on ID scopes that are suitable for stream processing when creating new elements
- I don't see any difference between the two proposals with respect to stream processing
* that are able to refer to all core constructs that makes sense
- Again, both proposals look expressive enough to uniquely identify any core constructs

I would suggest some changes to David's proposal. For the scope seperator I would suggest "/" instead of "~" as it seems more intuitive (used in XPath). For prefixes I would change from \{prefix} to {prefix}= as it seems to make more sense (as Yves said: "#u=123" says clearly "the unit with an id equals to 123").

Using UUIDs to add more file elements to an existing XLIFF document is a processing requirement and seems out of scope for the spec. Having said that, it seems like a small change that enables quite a powerfull operation (e.g. build a corpus of XLIFF files in a single XLIFF document). Changes required would include defining a new attribute for file to hold its unique requirements. There is also an issue with generating UUIDs in different programming languages as not all languages support UUID generation so third party tools would be required in some cases.

I may have overlooked some things while writing this up so if there is anything I missed feedback would be greatly appreciated.

Regards,
Dave

xliff message