Subject: Re: [xliff-comment] Storing characters outside the XML characterrange in XLIFF.
Yves, thanks for your input. I'll think I'll go for the <ph> approach, since (I just found this) the XLIFF 1.2 representation guide for gettext PO suggests using the <ph> or <x> element, having a bias towards the <ph> due to compatibility with TMX. (section 3.4). It gives a nice guideline on how these issues can be dealt with. Jan- Arve Yves Savourel wrote: >Hi Jan-Arve, > > > >>Is it possible to store characters such as >>backspace (0x08) or BEL (0x07) in the <source> >>and <target> tag in an XLIFF document by using >>a special kind of tag, or do I have to use my >>own tag in my own namespace? >> >>According to the XML spec, an XML document is >>not valid if it contains one of these. (More >>precisely, only #x09 (TAB), #x0A and #x0D are >>the allowed characters in the #x00-#x1F range). >> >> > >If you want to keep these characters as part of the text I would just use a special non-XML syntax (and maybe an un-usual one to >make sure no tool is tempted to convert them) something like: "<source>This is a #u0009# char.</source>" and your filter would know >how to convert them back. > >If you want to consider them as inline codes I would use the same non-XML syntax with normal tags. Something like: "<source>This is >a <ph i='1'>#u0009#</ph> char.</source>" > >But I think you cannot use your own namespace inside the <source> or <target> element (as far as I know that is true in 1.2 as >well). So you cannot do something like: "<source>This is a <m:char09/> char.</source>" for example. > >This issue is quite common (outside XLIFF too). Maybe there is a more standard way to represent these special characters that other >people use. But I haven't seen one so far. > > > > >>This might sound like a strange use-case, but I also want to use >>the same mechanism to preserve things such as carriage return (x#0D), >>since i want to store the XLIFF file with the OS's line-ending style. >>This means that any #x0D that is found during parsing is simply >>stripped off. This ensures that the same information is presented >>to the translator regardless of the line-ending style. >>However, if the translator/programmer explicitly used #x0D in the >>source text then I need to escape it in some way to ensure that it >>won't get stripped off. >> >> > >Mmm... But then if you have a special notation for line-breaks how other XLIFF tools will know they need to show line-breaks? The >codes will be in the way of the translation. > >If the problem is that you have different types of line-breaks (DOS vs Unix vs Mac) inside the same file then I would use a >user-define attribute in the <trans-unit> that indicates the filter how to convert back the line-breaks when merging. Something >like: > ><trans-unit xml:space='preserve' m:lbtype='mac'> > <source>Line 1 >Line 2 >Line 3</source> ></trans-unit> > >The xml:space makes sure the line-breaks are preserved, the m:lbtype makes sure you know how to convert them back. > > >Hope this helps, >-yves > > > -- Jan- Arve Sæther - jasaethe [at] trolltech [dot] com Trolltech ASA - Sandakerveien 116 - PO Box 4332 Nydalen - 0402 Oslo, Norway