OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

xliff message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]

Subject: RE: [xliff] Profile: inline codes in software string

Hi Magnus,

> My recommendation would be to always treat them as protected 
> "inline codes".
> Some good reasons for this:
> - This makes it obvious that they should be treated as 
> placeholders during translation.
> ...
> The filter can completely handle this conversion.
> - It is possible to use "generic" tag verification tools to 
> compare content in <source> and <target> elements to get 
> warnings if such placeholders have been added, removed, or 
> if their order has changed. (Otherwise this type of verification 
> would require a knowledge of the underlying file format and an 
> actual parsing of the text.)

I agree, it makes probably more sense to treat them as inline codes.

> - Characters that would otherwise be interpreted differently 
> e.g. '%' in the case of RC etc., can be represented as plain 
> text without being mixed up with the placeholders. Thus the 
> translator does not need to be aware of the underlying file 
> format (e.g. to know that they must write %% when they mean %). 

I think that aspect is a different (but also important) issue.

It applies even if the text has no variable, and to all type of resources,
almost all type of text. The question is: How to hanle escaped characters?

Should we "un-escape" the text in XLIFF and let the filter (knowing the
format extracted) deal with the re-escaping?
Or should we leave the escped characters as it and make sure translators (or
any leveraging mechanism) does the proper escapes?
Or, I guess, should we treat escaped character as inline codes?

Ideally the first solution would be the best. But there are a lot of reasons
the second can also be valid:
- there are a lot of legacy TM with escaped data.
- I don't think I know many main stream tool today that un-escape the text.
- Some content can have many escape levels: (e.g. HTML in Javascript, in an
XML repository, etc.)
- Dealing with scaping/unescaping while going through a process that as as
many step as localization may be more error-prone that just leaving the
escaped characters.

So I'm not sure which option would be the best choice.

Note that all this is more for the 'resource' world, as in the 'document'
world (HTML, XML etc.) characters are usually un-escaped by the filters.


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]