Re: [xliff] Comments on Fragment Identification

Thanks, Yves,

I will make a summary response without going into inline details.

The below and your original proposal shows that there are two options:

1) having several scopes and many prefixes

you say that splitting the id note scope is a show stopper, but it is what allows for only two internal id scopes and makes the referencing mechanism manageable.

2) having a few scopes and no need for prefixes in core

Obviously, we still need prefixes for modules and extensions

And I agree that we should say how extension prefixes can be formed

Also if # as a separator causes issues, we can go for another separator, I would propose ~ rather than /

Would java or other environments have an issue with ~ as our separator?

I know that they should not have issues with / but really we are not working with directories or folders

I do not insist that internal references are only possible within the lowermost of unit or file.

What I intended to say was that things like this #1 can only reference within a given <unit> or <file>. In other words that lack of context means that you are referencing locally (you say something similar in your proposal) within one of those scopes, which should cater for the vast majority of use cases.

Neither you or I proposed a mechanism analogical to going a level higher like ../ in a file system

And also we do not want to encourage referencing across units or files, so that should be OK.

I mean that absolute external references are fine, and that should cater for cases like pointing to an MT service, to a Wikipedia entry, or a TB server resouce..

Finally, shouldn't we use IRIs rather than URIs? I hope there is not much impact anyway, except that other than Latin script characters will be allowed as values.. Can ABNF work with signs needed for IRIs?

Cheers

Dr. David Filip

=======================

LRC | CNGL | LT-Web | CSIS

University of Limerick, Ireland

telephone: +353-6120-2781

cellphone: +353-86-0222-158

facsimile: +353-6120-2734

http://www.cngl.ie/profile/?i=452

mailto: david.filip@ul.ie

On Sun, Dec 1, 2013 at 12:44 PM, Yves Savourel <ysavourel@enlaso.com> wrote:

Hi all,

Two things in this email:

1. A few comments on the new section David created
2. Another solution

1) ===== Comments on the new proposed section "3 Fragment Identification":

-- (minor) Maybe the title could be more specific: "URI Fragment identifiers"

-- Maybe we could start the section by something else than:

[[
XLIFF Module fragment identification prefixes are specified in the resective modules.
]]

Maybe something like:

[[
Because XLIFF documents do not follow the usual behavior of XML documents when it comes to element identifiers, this specification
defines how applications must interpret the fragment identifiers in URIs pointing to XLIFF documents.
]]

-- We need a MIME type for XLIFF.
I believe David started the request, but I'm not sure. David?

-- I'm not sure I understood everything correctly in this section because there are no examples to illustrate the definitions.

-- For internal references, if I understand correctly the statement "Only referencing within the lowermost of the enclosing <unit>
or <file> is allowed":
This means the proposal allows only for very limited internal references, for example one cannot point from a mrk-ref to an element
outside the unit where that mrk is located.
If it's true, to me that's a show-stopper: it reduces drastically what you can do with annotation for example.

--- I'm not sure what we define for the internal reference:

One case starts with '#', the other starts with a module prefix (which seem all to start with '/').

So, far as I can tell, we would have: ref="##id" and ref="#/ref#id" (since a I assume "the fragment identifying string" means the
part after the # in a URI. Is that correct?

If my assumption is not correct then surly the only other possible interpretation is that the syntax is ref="#id" and
ref="/ref#id1". But that can't be right: the second case would be interpreted as a fragment identifier equals to "id1".

-- I've noticed that the proposal says: "IRI of the referenced document with the xlf extension". We should not limit the document to
xlf extensions. That's the recommended one, but one can use anything.

-- I don't think using # as a separator is wise.
It is already use to separate the fragment identifier from the rest of the URI.

It seems also to cause problem: if I do the following in Java:

assertEquals("id1#id2", new URI("http://www.test.net/file.xlf#file1#unit1").getFragment());

I get the following exception: java.net.URISyntaxException: Illegal character in fragment at index 34:
http://www.test.net/file.xlf#file1#unit1

So # as separator inside the fragment looks really bad to me.

I think / would work better as it's a traditional separator for parts/path.

-- It seems that for modules/extension the ID can be set in a id attribute, or in a name attribute.
a) Why allow two attributes?
b) also I have not seen any new PR that requests that all modules/extensions to use id (or name) attributes for their ID values.
c) and I have not seen any new PR that requests the id values of extensions use a character set compatible with a URI fragment (e.g.
NMTOKEN)

-- The file id attribute is to be unique per document. But that doesn't cover the use case of bundling several <file> into a single
document after extraction.
When a clash occurs: can the tool modify those file IDs? (no PR prevent it)
Or should we use UUID values for file id?

-- By "If the fragment to be identified is within an XLIFF Module's data," I assume you mean "... within a module element" (not sure
what "module's data" is)

-- There seems to be no definition of the rules to build a module/extension prefix. The text says that module prefixes are defined
in each module specification, but we need more than that: identification must work also for custom extensions since many may become
modules.
Once again: modules and extension should be treated equally from the core viewpoint.

-- The proposal has no provision for distinguishing source from target for the inline elements.
We can point to an inline element with an id='abc', but we don't know if it's the one in source or target.

-- The distinction between internal and external is very strange.

It means you can have this: "myFile.xlf#id1" and "#id1" and the two "id1" points to different places.

It also means you have different valid identifiers depending on their internal/external status. For example "myFile.xlf#f1#g1#i1" is
valid but "#f1#g1#i1" is not (if I understand correctly).

That make things quite confusing, and I've also never seen any fragment identifier making such distinction.
I think it's important to keep the same syntax and semantic for all the fragments, whether or not they are part of a full or
relative URI.

-- The 3 levels of IDs force un-natural scopes for IDs:

For example:
- The <note> elements have different scope if they are in or out of a unit;
- <data>'s id is in the same scope as inline codes/markers;
- We force <group> and <unit> to share the same Id space.

All this is very restrictive and will cause a lot of overhead in the implementation where the object model of the extracted document
may be very different and therefore accessing existing IDs to create new objects can be a lot different than in XLIFF.

We have to remember that XLIFF is not a processing format, just a exchange one.

2) ===== Other solution

I have started to proposed a different solution a while back in this email:
https://lists.oasis-open.org/archives/xliff/201311/msg00131.html

I don't like it very much, but it seems better than the proposal currently in the draft.

I'm not sure about the source/target flag and would like to hear back for that.

I'm not sure how to deal with modules/extensions differently than what's outlined in the email.

I'm still not sure what is the good solution for the <file> ids: should they be a UUID or not.

I think we should express whatever fragment identifier syntax in a clear ABNF-like notation rather than statments.

I think we should try to offer a regular _expression_ to validate whatever we came up with.

Below is a try at a more formal definition.
Note that it doesn't have provision for modules or source/target at this point.

fragId = withFile / withGroupOrUnitOrNote / inlineOrDataPart

withFile = filePart 1*("/" withGroupOrUnitOrNote)

filePart = "f=" fileId

fileId = value of the id attribute of one of the <file> elements in the document

withGroupOrUnitOrNote = notePart / groupPart / withUnit

notePart = "n=" noteId

noteId = value of the id attribute of one of the <note> elements in the parentFile

parentFile = the <file> element identified by filePart when available, otherwise the <file> element where the fragment identifier is
used

groupPart = "g=" groupId

groupId = value of the id attribute of one of the <group> elements in the parentFile

withUnit = unitPart 1*("/" inlineOrSataPart)

unitPart = "u=" unitId

unitId = value of the id attribute of one of the <unit> elements in the partFile

inlineOrDataPart = inlineId / dataPart

inlineId = value of a <segment>, <ignorable>, <mrk>, <sm>, <pc>, <sc>, <ec>, or <ph> element in the parentUnit

parentUnit = the <unit> element identified by unitPart when available, otherwise the <unit> element where the fragment identifier is
used.

dataPart = "d=" dataId

dataId = value of the id attribute of one of the <data> elements in the parentUnit

There are examples of the fragments in the initial email:
https://lists.oasis-open.org/archives/xliff/201311/msg00131.html

cheers,
-yves

---------------------------------------------------------------------
To unsubscribe from this mail list, you must leave the OASIS TC that
generates this mail. Follow this link to all your TCs in OASIS at:
https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php

xliff message