Re: [xliff] Comments on Fragment Identification

Thanks, Yves,

I will call the proposed solutions "simple" and "complex" from now on.

I have been working on an improved version of the "simple" solution based on your great and constructive feedback.

I believe I have amended most of the drawbacks that you pointed out, anyway all that I considered drawbacks :-)

The baseline still is the limited number of internal id scopes.. but I introduced a target prefix and allowed for referencing outside unit or file as needed..

See this:

[I realize that this is ugly and not readable so I will need to print it on SVN, although I did not want to..]

Fragment Identification

Because XLIFF Documents do not follow the usual behavior of XML documents when it comes to element identifiers, this specification defines how Agents must interpret the fragment identifiers in URIs and IRIs pointing to XLIFF Documents.

Identifying fragments within <target> elements

Since XLIFF Documents will often contain id values duplicate by design between source and target content, this fragment identification mechanism needs to specify a fragment identification prefix for referencing fragments enclosed by a <target> element.

The target prefix is: /t.

Fragments in XLIFF Modules and Extensions

XLIFF Module fragment identification prefixes are specified in the respective modules.

Extensions that need to specify identifiable fragments, must specify their own fragment identification prefixes analogically to XLIFF Module prefixes.

Constraints

Module and extesnion fragment identification prefixes must start with the / character. The remaining part of the prefix must be an NMTOKEN at least 2 characters and at most 5 characters long.

Extension prefixes must not compete for values with fragment indentification prefix values specified or reserved within this specification.

Modules and Extensions that need to be referenced from XLIFF Core or Modules must use an id attribute specified within their own namespace or the xlf:id attribute, whereas allowed id values must be complinat with appearing within URIs or IRIs.

External Identification

When identifying an XLIFF fragment from outside the referenced XLIFF Document, the IRI must be composed from the following components in the given order:

IRI of the referenced document with the xlf extension followed by the character #.

If the fragment to be identified is within an XLIFF Module's or extension's element, the respective fragment identifying prefix followed by the ~ character followed by an id value unique within the relevant module or extension scope.

If the fragment to be identified is at a lower level, the NMTOKEN string that is the value of the id attribute of the <file> element enclosing the fragment.

If the fragment to be identified is at a lower level, character ~ followed by the NMTOKEN string that is the value of the id attribute of the lowermost <unit> or <group> element enclosing the fragment.

If the fragment to be identified is at the lowest level and enclosed within a <target> element, prefix /t followed by the character ~ followed by the NMTOKEN string that is the value of the id attribute of the element to be identified.

If the fragment to be identified is at the lowest level but not enclosed within a <target> element, character ~ followed by the NMTOKEN string that is the value of the id attribute of the element to be identified.

Internal Identification

Referencing without context is always within the lowermost of the enclosing <unit>, <file>, or <xliff> element.

Constraints

When referencing an internal fragment of the same XLIFF Document, the fragment identifying string must be one of the following:

The NMTOKEN string that is a value of one of the id attributes within the lowermost of the enclosing <unit> or <file>.

A module prefix followed by the ~ character followed by an id value unique within the relevant module scope.

A string composed as per steps 2. through 8. in the section External Identification.

Cheers

Dr. David Filip

=======================

LRC | CNGL | LT-Web | CSIS

University of Limerick, Ireland

telephone: +353-6120-2781

cellphone: +353-86-0222-158

facsimile: +353-6120-2734

http://www.cngl.ie/profile/?i=452

mailto: david.filip@ul.ie

On Mon, Dec 2, 2013 at 8:37 PM, Yves Savourel <ysavourel@enlaso.com> wrote:

Hi David, all,

Thanks for the comments, here are a few other:

> 1) having several scopes and many prefixes

> 2) having a few scopes and no need for prefixes in core

1 tries to reflects the reality of the data.
2 foists restrictions on the data in order to make things fit a specific notation

> you say that splitting the id note scope is a show
> stopper, but it is what allows for only two internal
> id scopes and makes the referencing mechanism manageable.

Not sure what you mean by "manageable". In both case you have to write specialized code to deal with the fragment identifier.
Besides, what looks "manageable" in XLIFF may not be so easy on the implementations using the IDs.

In my opinion your notation brings several issues that make it less attractive:

- no handling of source/target difference for inline Ids.

- two fragments identifiers can be identical but mean different things depending if they are in a full URI or not.

- grouping too much the ID scopes to end up with only three scopes, just to make the notation work in XLIFF. Remember than XLIFF is
just an exchange format. Here you are changing what the data could be in order to try to make it fit the XLIFF representation.

- and a few other things mentioned in my previous email.

> ... we can go for another separator,

> I would propose ~ rather than /

> I know that they should not have issues with / but
> really we are not working with directories or folders

The character / is commonly used separate levels in many context, not just directories, for example: tree locations, XPath
expressions, etc. Also I used ~ for the source/target case.

> What I intended to say was that things like this #1 can
> only reference within a given <unit> or <file>.

Yes, and I did understand correctly.
That means you cannot refer from inside a <unit> to outside, or vice-versa.
That's a major limitation in my opinion.

> And also we do not want to encourage referencing
> across units or files, so that should be OK.

Why? You may want to have data living at the file level that need to be pointed to from within a <unit>. The Resource Data module
does exactly that. You may want to do this type of referencing from an <mrk> for example, or from a future module (like ITS).

> Finally, shouldn't we use IRIs rather than URIs?
> I hope there is not much impact anyway, except
> that other than Latin script characters will be
> allowed as values..

I think IRI should be find.

Cheers,
-yves

---------------------------------------------------------------------
To unsubscribe from this mail list, you must leave the OASIS TC that
generates this mail. Follow this link to all your TCs in OASIS at:
https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php

xliff message