xliff message

Subject: Needs for Content Markup

From: Yves Savourel <ysavourel@translate.com>
To: 'Arle Lommel - LISA standards and publications' <arle@lisa.org>
Date: Thu, 12 Mar 2009 22:10:43 -0600

Feedback for TMX 2.0 proposal, with relation to XLIFF:

In the TMX 2.0 proposal there are two main sets of changes:

The first one that affect some elements outside <seg>, some are good, some less good (IMO), but they generally bring new features
and do not break compatibility or only a little.

The second set is the new proposed content markup (<itag>). It does bring a massive compatibility break, but--and that is my main
problem--does not bring any new feature: There is nothing you can do with <itag> that you cannot already do with the 1.4 content
markup.

( I would even argue that it makes parsing somewhat more complicated. For example: now you have to query both the name of the
element as well as its type attribute to know what kind of code it represents, before, in most cases you could do this by just
looking at its name. )

Maybe I am missing some important benefits, and then I would like to be enlightened. But as far as I cannot see any (<sub> is still
there, we still have some text-nodes with real text some other with codes, etc.) It seems the proposed markup changes completely how
the content is coded, but does not change any of functionality.

I believe the content of both TMX and XLIFF is the same thing: an abstracted representation of extracted text with inline codes. It
has the same purposes and the same requirements. In fact when I read an TMX <seg> or a XLIFF <source> I use the same object to store
their content, and I generate either format from that single type of object. This is a pretty strong indication that both are
similar. And if they are: why do we need two different XML representations for it?

What that representation should be is a different question.

[[---> now comes the important part:

Establishing that uniqueness is important because it paves the way to have a exchange format between translation tools at the text
fragment level. As the component-based and internet-driven technologies evolve we need to make sure the tools of the future will be
able to communicate as seamlessly as possible not only using documents exchange, but also small segment of information.

Many of the Web services, plugins, and other bricks that are making up the tools being build today need to exchange data at the
segment level, not at the file level. Whether these components identify terms, highlight spelling mistakes, provide TM matches, or
MT guesses, they all, ultimately, need to access the same abstracted extracted text.

Having a single representation for TMX and XLIFF contents is not only logical, it is necessary to bring more interoperability
between the tools being build today.

---]]

So Arle, I have an idea for OSCAR: Instead of creating new a content markup now, I would suggest:

a) If it seems important to have a new version of TMX published, to move to TMX 1.5 with some of the changes proposed in this draft,
but without touching the content markup.

b) Then both groups can work together to come up with a common representation of the content markup in both formats.

-ys