xliff message

Subject: RE: [xliff] Comments on Fragment Identification

From: Yves Savourel <ysavourel@enlaso.com>
To: <xliff@lists.oasis-open.org>
Date: Sun, 8 Dec 2013 07:22:42 -0700

Hi, Fredrik, David, all,

Thanks for the detailed email and proposal.
I think we are converging toward a solution.


--- <file>

> <file>, prefix 'f', unique within document

No direct issue here from my viewpoint.

But I think we have a few un-resolved related questions:

- How this works for tools that re-groups several documents within a single one during process?
This is a relatively common feature. Shall they modify the file id to ensure uniqueness? (but then how do you come back from that?)
Shall the id value be a UUID?

- And also what is the relationship between original and id in <file>



--- <group> and <unit>

> <group>, prefix 'g', unique within <file>
> <unit>, prefix 'u', unique within <file>, separate from 
> <group> to keep references shorter.

I agree. Grouping both in the same scope is a major drawback in my opinion in David's proposal.
Keeping them separated allows also to avoid merging the ID scopes of two types of objects that are very different and likely mapped
separately in implementations.



--- <note>

> <note>, prefix 'n', unique within parent <file>,<group> 
> or <unit>. Ie one scope per parent container

I'm still digesting this one. So the following (simplified) would be ok:

<file>
 <group>
   <unit>
    ...
    <note id='n1'>
   </unit>
   ...
   <note id='n1'> 
 </group>
 ...
 <note id='n1'>
</file>

I think that would work.



--- original data

> <originalData>, prefix 'od', unique within its parent <unit>

You probably meant to write <data> (<originalData> has no id).
So I would use 'd' for prefix (shorter)



--- inline elements

> Inline tags in source, prefix 'is', unique within its 
> enclosing <unit>
> Inline tags in target, prefix 'it', unique within its 
> enclosing <unit>
> Inline tags, prefix 'i', not unique may match in both 
> source and target. Not sure if we really want this, feels 
> like it could be useful.

So, at this point it seems we have a solid consensus that inline elements (<segment>, <ignorable>, <ph>, <pc>, <sc>/<ec>, <mrk>,
<sm>/<em>) use the same ID scope.

I'm not sure the "'i' for source/target" is quite OK with a URI: after all its main goal is to identify a unique location in the
document. This would be useful if you would have an application needing to point to both elements at the same time.

I would try to simplify:

Use 't' prefix for target inline codes and target inline annotations
and use no prefix for source inline codes, source inline annotations, segment and ignorable.



--- Extensions/Modules

I include modules here because from the referencing viewpoint they have to follow the same rules as extensions (or vice-versa).

I don't think Fredrik had a proposal for those. David proposed to have module/extension specific prefix. Originally, I proposed to
have module/extension use a single prefix and UUIDs.

The main reason I was proposing UUIDs, was that I couldn't think of a way to ensure module/extension prefixes will be unique: Two
extensions may decide to use the same one or one used by a module, or a new module pick one used already by someone's extension,
etc.

Now I think we may go David's way for this, but possibly with a different rule. Instead of defining a prefix per module/extension,
we could say that:
a) a namespace prefix must be declared for the given module/extension in the <file> (ensuring there is a prefix associated to that
module/extension).
b) the prefix to use in the fragment identifier is the same as the namespace prefix used declared in a).

So you would have something like this:

<file id='f1' xmlns:tbx="iso:std:iso:30042:ed-1:v1:en">
 ...
 <tbx:termEntry xml:id="tidle-tbx-taws-ebt-1">
  ...
 <unit id='1'>
  <segment>
   <source>Some <mrk id='m1' type='term' ref='#/f=f1/tbx=tidle-tbx-taws-ebt-1">term</mrk></source>
   ...
</file>

I don't think it's a perfect solution as: 1) namespace prefixes can be overridden locally, 2) a tool may decide to use the same
namespace prefix as a URI prefix used for core element, and 3) the prefix may change from file to file.
But it's still a safer way to ensure the prefix used in the URI is linked to the proper module/extension and doesn't clash with
another one.

Another solution for this could be to introduce a new element where we declare the prefixes and the module/extension namespace URI.
Kind of a parallel namespace mechanism. But it feels wrong to duplicate the normal namespace mechanism.



--- Syntax

'/' as scope separator looks fine.

'~' as prefix separator looks strange to me. In my opinion '=' is a lot more natural ("#u=123" says clearly "the unit with an id
equals to 123)


Cheers,
-yves

References:
- Comments on Fragment Identification
  - From: Yves Savourel <ysavourel@enlaso.com>
- Re: [xliff] Comments on Fragment Identification
  - From: "Dr. David Filip" <David.Filip@ul.ie>
- RE: [xliff] Comments on Fragment Identification
  - From: Yves Savourel <ysavourel@enlaso.com>
- Re: [xliff] Comments on Fragment Identification
  - From: "Dr. David Filip" <David.Filip@ul.ie>
- RE: [xliff] Comments on Fragment Identification
  - From: Yves Savourel <ysavourel@enlaso.com>
- RE: [xliff] Comments on Fragment Identification
  - From: "Estreen, Fredrik" <Fredrik.Estreen@lionbridge.com>