xliff message

Subject: RE: [xliff] Implementing extensions
From: Yves Savourel <ysavourel@enlaso.com>
To: <xliff@lists.oasis-open.org>
Date: Wed, 21 Mar 2012 11:32:38 -0600
Hi Rodolfo,

Thanks for taking the time to answer.
Please see my notes below.



>> I think it was meant to say "Preserve metadata 
>> without using namespaces"
>
> Wrong assumption. The proposal is to create an optional 
> module, that would use namespaces.

Let me rephrase: I think it was meant to say "Preserve metadata without using custom namespaces".

Whether or not <metaHolder>/<meta> is in its own module namespace or not does not make a difference when comparing it to using custom namespace for extensions.



>> As we started to discuss during last call, I think we
>> need to compare that proposal with another way: which is to 
>> use namespaces like in v1.2.
>
> The proposal uses namespaces. Please check the examples I 
> sent to the mailing list.

Rephrasing: "which is to use custom namespaces like in v1.2."


 
>> -- ns.pro) Namespaces is a standard XML mechanism to mix different 
>> vocabularies. All the XML technology stack understand namespaces
>> and can work with them. It's a built-in mechanism, why invent 
>> something different?
>
> We are using namespaces for this.

You don't use custom namespaces. With <metaHolder> (in its namespace), you cannot use the namespace mechanism to differentiate between distinct extensions: all extensions are in the same element. The only way distinguish them is the key attribute. It'll work (providing tools don't step on each other feet), but it's not as flexible and powerful as using custom namespaces.



>> -- ns.pro) The extensions can be well documented and 
>> validated using schemas.
>
> That's not what happens in real life. Custom schemas has been
> kept secret for a long while. They are not secret today but 
> they are not documented at all.

Then the question is: Must custom schemas for extension be public?

After all, the tools vendor doing this are part of our community too. We should cater to their requirements too.

After all, why do we need those schemas?
Do we have other reason that validation?



> All we can validate today is well formedness. We cannot 
> validate if the required elements and attributes 
> are present in the right place.

It seems this is the crux of the problem. Let's break it down into several parts:

a) Can the XLIFF namespaces be validated even if there are other namespaces present that do not have a schema?

I think the answer is yes. The extension points can be defined with processContents="lax". In that case only the namespaces for which you do have a schema gets validated, and no error is thrown for the parts you do not validate.

> Tools that use custom extensions reject files 
> that are valid from XML point of view and XLIFF 
> point of view.

I know tools that use custom namespaces and they don't reject valid XML or XLIFF files. It seems the problem you describe is specific to some tools, not to how XML/XLIFF various namespaces can or cannot be validated. But maybe I'm missing something.


b) Do we really care to validate the custom namespaces?

If we can validate the XLIFF namespaces, isn't good enough?
Tools that understand the extensions can add the validation of the custom namespaces if they want.

Note also that using <metaHolder>/<meta> would not allow tools to really validate the intended content of those elements. You could only validate against the XLIFF schema. But the content of <meta> could still be wrong for its intended use. For example, if a tool use a metadata extension myConfidenceValue to store some numeric value:

<meta key='myConfidenceValue>12</meta>

And at some point in the process some tool replaces this by:

<meta key='myConfidenceValue>XYZ</meta>

The XLIFF schema, being all purpose, will see the second snippet as valid, while it is not going to work from the specific extension view point.

So I argue that the <meta> proposal doesn't really allow true validation of the content anyway, and therefore truly validating the extension is probably not that important except for the tools that use the extension.



> Custom XML schemas should be known and available 
> for validating XLIFF files that use custom extensions.
> Without them, it is not possible to check if all 
> required elements and attributes are present.

This goes back to the question: Do we care if we cannot validate custom namespaces?



> One possible solution is to mandate that custom 
> schemas be embedded in the XLIFF file, as Microsoft 
> does with ResX files. 

That wouldn't solve the issue of the tools who do not want to publish their schemas. I cannot think of any reason why one would want to keep an extension private, but we can't discard that some may have valid reasons for this.

I think we really need to re-visit the assumption that custom extensions MUST be public.

I think they SHOULD (!= from MUST), but if there is a technical way to not validate private extensions without impeding the validation of the XLIFF namespaces and public extensions, maybe it's fine.
The non-validation of a private custom namespace is the price for them to pay for not making the schema public.


>> -- bag.con) With the <meta> method when we want to make a
>> custom extension an official module we would have to 
>> re-define it in its namespace, and possibly change how it 
>> is coded (since <meta> has restrictions the namespace does 
>> not have). Tools would have to be adapted instead of just 
>> switching namespace.
>
> Remember that <meta> would live in its own namespace.

Sure, but that doesn't change anything. We still cannot just switch namespace. We have to map the <meta> set to a new module namespace.

Using a custom namespace for the extension we can just switch to a different namespace URI to make it an official XLIFF module.

 
>> -- ns.pro) Using namespaces allows simple metadata to 
>> be represented very simply: just with an attribute for example.
> 
> Once again, <meta> would live in its own namespace.

That doesn't change anything. We still cannot just use an attribute like we could if the extension was based on a custom namespace.

 
>> ...with some idref. It's more verbose and more complicated
>> to associate the data with the element to which it pertains.
>> For example:
>> 
>> <segment>
>>  <source>The price is <ph id='1'></source>
>>  <metaHolder>
>>   <meta key='myData' idref='1'>value</meta>
>>  </metaHolder>
>> </segment>
>
> There is no need to use id/idref. In fact, the examples I 
> sent to the list don't use  that mechanism.

I see an example here:
http://lists.oasis-open.org/archives/xliff/201203/msg00027.html

and another here:
http://lists.oasis-open.org/archives/xliff/201203/msg00030.html

But both are about associating metadata with HTML attributes of the parent element enclosing the content of <unit>, not with associating an inline element with metadata.

In any case, regardless of id/idref, using something like <mtd:meta> is more complex than just using an extension attribute directly in the inline element.


 
>> -- bag.con) It makes the more complex extensions very
>> verbose and potentially a lot more complicated to define.
>> Anything that goes beyond a flat set of key/value pairs 
>> will be difficult to define and use.
>
> Not really. Custom extensions can be more complicated than
> the proposed module.

Mmm, I'll assume you meant "custom *namespaces* can be more complicated..." as defining tool-specific <meta> key/value pairs is defining an extension.

To be clear: For me an extension is a set of data that represents a feature which does not exist in the XLIFF specification (and therefore extends the capabilities of XLIFF).
We are discussing two ways to store such data: a) using the <metaHolder> proposal from Bryan/Rodolfo and b) using custom namespaces.

To code complex extensions with <metaHolder>/<meta> you have to do things more complicated than if the same extension was coded in its own namespace. With <meta> you are limited to a set of flat key/value pairs. With a custom namespace you have no limit in how the structure looks like.



>> -- ns.pro) Using namespaces would allow to re-use
>> existing vocabularies for things not defined in XLIFF
>> itself.
>
> The key is in "things not defined in XLIFF itself". 
> We see extensions used for things that XLIFF already contemplate.

And it's wrong. But using <metaHolder>/<meta> instead of custom namespaces does not solve that problem. The solution for this is not related to the way we represent the extension. 

The point here was, when defining an good extension (one that truly extends XLIFF), by using non-XLIFF namespaces instead of the <metaHolder>/<meta> module, you can re-use existing vocabularies that may be standard or well-known for what they represent. You do not need to convert them to key/value <meta> elements.

 
 
>> -- bag.pro) With <metaHolder>/<meta> the specification
>> can set processing expectations that, for example, force 
>> or forbid different things.
>> 
>> True. But the same processing expectations can be set for
>> non-XLIFF namespaces used in XLIFF.
>
> We don't have any real proposition for setting processing 
> expectations on custom name-space based extensions.

And we don't have any real proposition for setting processing expectations on <metaHolder>/<meta>-based extensions either.

This is needed (and likely the same) regardless what mechanism we end up using.



> Just to clarify: custom extensions based on namespaces could 
> be good if their implementation is well regulated. 
> Unfortunately we cannot control what tool vendors do.

True, we can only decide where extensions could be allowed and define some processing expectations on what should and should not an extension be or do. But there is no way to truly enforce (validate) such expectations.
And that is the same with <metaHolder>/<meta>.

Cheers,
-yves
References:
- RE: [xliff] Implementing extensions
  - From: "Rodolfo M. Raya" <rmraya@maxprograms.com>