xliff message

Subject: RE: [xliff] Profile: inline codes in software string
From: Magnus Martikainen <magnus@trados.com>
To: Yves Savourel <ysavourel@translate.com>, 'XLIFF Main List' <xliff@lists.oasis-open.org>
Date: Tue, 20 Apr 2004 18:39:16 -0700
Hi Yves,

Some good reasons for preferring the first option:
1) It gives better leverage across file formats. 
2) It presents the text for translation in as "neutral" format as possible,
thus making the job easier for both tools and people that need to process
the file. Tools and people can process the content without knowing what
underlying data format it represents. After all XLIFF should be about making
interchange of data for localisation easier. Requiring tools and people that
process XLIFF files to know about the underlying file format and the
functionality of the filter that created the file should be avoided as far
as possible.
3) If we leave the escaping and un-escaping responsibility to the filters we
actually take away complexity from the localisation process, as there should
then be one thing less to worry about...

In cases were interoperation with an existing data source that requires
"escaped" characters it is always possible to introduce the escaping for
that particular interaction. However that should be the special case, not
the other way around.

It is my opinion that XLIFF as a standard should look at bringing the best
for the future, rather than promoting "bad practices" simply because they
have been used in the past.

Regards,
Magnus

-----Original Message-----
From: Yves Savourel [mailto:ysavourel@translate.com] 
Sent: Tuesday, April 20, 2004 11:09 AM
To: 'XLIFF Main List'
Subject: RE: [xliff] Profile: inline codes in software string

Hi Magnus,

> - Characters that would otherwise be interpreted differently 
> e.g. '%' in the case of RC etc., can be represented as plain 
> text without being mixed up with the placeholders. Thus the 
> translator does not need to be aware of the underlying file 
> format (e.g. to know that they must write %% when they mean %). 

I think that aspect is a different (but also important) issue.

It applies even if the text has no variable, and to all type of resources,
almost all type of text. The question is: How to hanle escaped characters?

Should we "un-escape" the text in XLIFF and let the filter (knowing the
format extracted) deal with the re-escaping?
Or should we leave the escped characters as it and make sure translators (or
any leveraging mechanism) does the proper escapes?
Or, I guess, should we treat escaped character as inline codes?

Ideally the first solution would be the best. But there are a lot of reasons
the second can also be valid:
- there are a lot of legacy TM with escaped data.
- I don't think I know many main stream tool today that un-escape the text.
- Some content can have many escape levels: (e.g. HTML in Javascript, in an
XML repository, etc.)
- Dealing with scaping/unescaping while going through a process that as as
many step as localization may be more error-prone that just leaving the
escaped characters.

So I'm not sure which option would be the best choice.

Note that all this is more for the 'resource' world, as in the 'document'
world (HTML, XML etc.) characters are usually un-escaped by the filters.

Cheers,
-yves


To unsubscribe from this mailing list (and be removed from the roster of the
OASIS TC), go to
http://www.oasis-open.org/apps/org/workgroup/xliff/members/leave_workgroup.p
hp.