OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

xliff-comment message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: RE: [xliff-comment] Storing characters outside the XML character range in XLIFF.


Hi Jan-Arve,

> Is it possible to store characters such as 
> backspace (0x08) or BEL (0x07) in the <source> 
> and <target> tag in an XLIFF document by using 
> a special kind of tag, or do I have to use my 
> own tag in my own namespace?
>
> According to the XML spec, an XML document is 
> not valid if it contains one of these. (More 
> precisely, only #x09 (TAB), #x0A and #x0D are 
> the allowed characters in the #x00-#x1F range).

If you want to keep these characters as part of the text I would just use a special non-XML syntax (and maybe an un-usual one to
make sure no tool is tempted to convert them) something like: "<source>This is a #u0009# char.</source>" and your filter would know
how to convert them back.

If you want to consider them as inline codes I would use the same non-XML syntax with normal tags. Something like: "<source>This is
a <ph i='1'>#u0009#</ph> char.</source>"

But I think you cannot use your own namespace inside the <source> or <target> element (as far as I know that is true in 1.2 as
well). So you cannot do something like: "<source>This is a <m:char09/> char.</source>" for example.

This issue is quite common (outside XLIFF too). Maybe there is a more standard way to represent these special characters that other
people use. But I haven't seen one so far.


> This might sound like a strange use-case, but I also want to use 
> the same mechanism to preserve things such as carriage return (x#0D), 
> since i want to store the XLIFF file with the OS's line-ending style. 
> This means that any #x0D that is found during parsing is simply 
> stripped off. This ensures that the same information is presented 
> to the translator regardless of the line-ending style.
> However, if the translator/programmer explicitly used #x0D in the 
> source text then I need to escape it in some way to ensure that it 
> won't get stripped off.

Mmm... But then if you have a special notation for line-breaks how other XLIFF tools will know they need to show line-breaks? The
codes will be in the way of the translation.

If the problem is that you have different types of line-breaks (DOS vs Unix vs Mac) inside the same file then I would use a
user-define attribute in the <trans-unit> that indicates the filter how to convert back the line-breaks when merging. Something
like:

<trans-unit xml:space='preserve' m:lbtype='mac'>
 <source>Line 1
Line 2
Line 3</source>
</trans-unit>

The xml:space makes sure the line-breaks are preserved, the m:lbtype makes sure you know how to convert them back.


Hope this helps,
-yves



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]