OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

xliff-comment message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [xliff-comment] Storing characters outside the XML characterrange in XLIFF.


Yves, thanks for your input.

I'll think I'll go for the <ph> approach, since (I just found this) the 
XLIFF 1.2 representation guide for  gettext PO suggests using the <ph> 
or <x> element, having a bias towards the <ph> due to compatibility with 
TMX. (section 3.4).
It gives a nice guideline on how these issues can be dealt with.

Jan- Arve

Yves Savourel wrote:

>Hi Jan-Arve,
>
>  
>
>>Is it possible to store characters such as 
>>backspace (0x08) or BEL (0x07) in the <source> 
>>and <target> tag in an XLIFF document by using 
>>a special kind of tag, or do I have to use my 
>>own tag in my own namespace?
>>
>>According to the XML spec, an XML document is 
>>not valid if it contains one of these. (More 
>>precisely, only #x09 (TAB), #x0A and #x0D are 
>>the allowed characters in the #x00-#x1F range).
>>    
>>
>
>If you want to keep these characters as part of the text I would just use a special non-XML syntax (and maybe an un-usual one to
>make sure no tool is tempted to convert them) something like: "<source>This is a #u0009# char.</source>" and your filter would know
>how to convert them back.
>
>If you want to consider them as inline codes I would use the same non-XML syntax with normal tags. Something like: "<source>This is
>a <ph i='1'>#u0009#</ph> char.</source>"
>
>But I think you cannot use your own namespace inside the <source> or <target> element (as far as I know that is true in 1.2 as
>well). So you cannot do something like: "<source>This is a <m:char09/> char.</source>" for example.
>
>This issue is quite common (outside XLIFF too). Maybe there is a more standard way to represent these special characters that other
>people use. But I haven't seen one so far.
>
>
>  
>
>>This might sound like a strange use-case, but I also want to use 
>>the same mechanism to preserve things such as carriage return (x#0D), 
>>since i want to store the XLIFF file with the OS's line-ending style. 
>>This means that any #x0D that is found during parsing is simply 
>>stripped off. This ensures that the same information is presented 
>>to the translator regardless of the line-ending style.
>>However, if the translator/programmer explicitly used #x0D in the 
>>source text then I need to escape it in some way to ensure that it 
>>won't get stripped off.
>>    
>>
>
>Mmm... But then if you have a special notation for line-breaks how other XLIFF tools will know they need to show line-breaks? The
>codes will be in the way of the translation.
>
>If the problem is that you have different types of line-breaks (DOS vs Unix vs Mac) inside the same file then I would use a
>user-define attribute in the <trans-unit> that indicates the filter how to convert back the line-breaks when merging. Something
>like:
>
><trans-unit xml:space='preserve' m:lbtype='mac'>
> <source>Line 1
>Line 2
>Line 3</source>
></trans-unit>
>
>The xml:space makes sure the line-breaks are preserved, the m:lbtype makes sure you know how to convert them back.
>
>
>Hope this helps,
>-yves
>
>  
>


-- 
Jan- Arve Sæther  -  jasaethe [at] trolltech [dot] com
Trolltech ASA -  Sandakerveien 116 - PO Box 4332 Nydalen - 0402 Oslo, Norway



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]