[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: RE: [xliff-comment] Storing characters outside the XML character range in XLIFF.
Hi Jan-Arve, > Is it possible to store characters such as > backspace (0x08) or BEL (0x07) in the <source> > and <target> tag in an XLIFF document by using > a special kind of tag, or do I have to use my > own tag in my own namespace? > > According to the XML spec, an XML document is > not valid if it contains one of these. (More > precisely, only #x09 (TAB), #x0A and #x0D are > the allowed characters in the #x00-#x1F range). If you want to keep these characters as part of the text I would just use a special non-XML syntax (and maybe an un-usual one to make sure no tool is tempted to convert them) something like: "<source>This is a #u0009# char.</source>" and your filter would know how to convert them back. If you want to consider them as inline codes I would use the same non-XML syntax with normal tags. Something like: "<source>This is a <ph i='1'>#u0009#</ph> char.</source>" But I think you cannot use your own namespace inside the <source> or <target> element (as far as I know that is true in 1.2 as well). So you cannot do something like: "<source>This is a <m:char09/> char.</source>" for example. This issue is quite common (outside XLIFF too). Maybe there is a more standard way to represent these special characters that other people use. But I haven't seen one so far. > This might sound like a strange use-case, but I also want to use > the same mechanism to preserve things such as carriage return (x#0D), > since i want to store the XLIFF file with the OS's line-ending style. > This means that any #x0D that is found during parsing is simply > stripped off. This ensures that the same information is presented > to the translator regardless of the line-ending style. > However, if the translator/programmer explicitly used #x0D in the > source text then I need to escape it in some way to ensure that it > won't get stripped off. Mmm... But then if you have a special notation for line-breaks how other XLIFF tools will know they need to show line-breaks? The codes will be in the way of the translation. If the problem is that you have different types of line-breaks (DOS vs Unix vs Mac) inside the same file then I would use a user-define attribute in the <trans-unit> that indicates the filter how to convert back the line-breaks when merging. Something like: <trans-unit xml:space='preserve' m:lbtype='mac'> <source>Line 1 Line 2 Line 3</source> </trans-unit> The xml:space makes sure the line-breaks are preserved, the m:lbtype makes sure you know how to convert them back. Hope this helps, -yves
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]