[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: Processing extension elements
It seems to me that the fundamental question for extension in <source>/<target> is how "generic" tools will be able to deal with them, while preserving them. Here are the of possible extension processings I can think of (without worrying about how this would be expressed in the XLIFF schema): #1- The unknown elements and their content are stripped out. #2- The unknown elements are stripped out, their content left part of the <source>/<target>. #3- The unknown elements are preserved and treated as <g> (or <x/> if they are empty elements). #4- The unknown elements are preserved treated as <ph> (their content is seen as code). #5- The unknown elements have some XLIFF-understood indication on how to be treated. - "generic tool" means a tool that does the minimal processing allows by the specifications. It does not known any specific extension. - "as seen by a generic tool" means how the unknown tags would be interpreted in memory (regardless how they are actually represented) by tools that would not know what to do with them. - There are actually two cases of processing: during merge and not during merge. During a merge process the unknown elements should be ignored by the generic tool (just like an <mrk> element). One has to decide what to do with the content: discard it or treat it as part of the text. Now let's see examples, pros, and cons for each case: ============================================================ #1- The unknown elements and their content are stripped out. ------------------------------------------------------------ The more drastic solution. Original entry: <source xml:lang='en'>This is <htm:b>big</htm:b></source> As seen a generic tool: <source xml:lang='en'>This is </source> Saved by a generic tool: <source xml:lang='en'>This is </source> Probably not what we want as extensions that would enclose the original content become death trap for translatable text. ============================================================ #2- The unknown elements are stripped out, their content left part of the <source>/<target>. ------------------------------------------------------------ Original entry: <source xml:lang='en'>This is <htm:b>big</htm:b></source> As seen a generic tool: <source xml:lang='en'>This is big</source> Saved by a generic tool: <source xml:lang='en'>This is big</source> A very simple way to deal with unknown tags. But it would add un-wanted content if the content of the extension elements are really metadata, as shown below. Original entry: <source xml:lang='en'>This is <x:def><x:term>big</x:term><x:pron>'big</x:pron></x:def> </source> As seen a generic tool: <source xml:lang='en'>This is big'big</source> Saved by a generic tool: <source xml:lang='en'>This is big'big</source> ============================================================ #3- The unknown elements are preserved and treated as <g> (or <x/> if they are empty elements). ------------------------------------------------------------ Original entry: <source xml:lang='en'>This is <htm:b>big</htm:b></source> As seen a generic tool: <source xml:lang='en'>This is <g id='0'>big</g></source> Saved by a generic tool: <source xml:lang='en'>This is <htm:b>big</htm:b></source> This solution would also add un-wanted content if the content of the extension elements are really metadata, as shown below. Original entry: <source xml:lang='en'>This is <x:def><x:term>big</x:term><x:pron>'big</x:pron></x:def> </source> As seen a generic tool: <source xml:lang='en'>This is <g id='0'><g id='1'>big</g> <g id='2'>'big</g></g></source> Saved by a generic tool: <source xml:lang='en'>This is <x:def><x:term>big</x:term><x:pron>'big</x:pron></x:def> </source> ============================================================ #4- The unknown elements are preserved treated as a <ph> (their content is seen as code). ------------------------------------------------------------ This is John's senario (I think). It works fine if the content of all extension elements is metadata. Original entry: <source xml:lang='en'>This is big<x:note>blah blah</x:note> </source> As seen a generic tool: <source xml:lang='en'>This is big<ph id='0'>blah blah</ph> </source> Saved by a generic tool: <source xml:lang='en'>This is big<x:note>blah blah</x:note> </source> But it does not work for text content inside extension elements, as it would be seen as "code". Original entry: <source xml:lang='en'>This is <htm:b>big</htm:b></source> As seen a generic tool: <source xml:lang='en'>This is <ph id='0'>big</ph></source> (Code not text --------------------------^ ) Saved by a generic tool: <source xml:lang='en'>This is <htm:b>big</htm:b></source> ============================================================ #5- The unknown elements have some XLIFF-understood indication on how to be treated. ------------------------------------------------------------ There are two ways to indicate this: By an XLIFF-defined attribute the extension elements would have or by enclosing the extensions in a special new XLIFF element such as <extend>. Original entry: <source xml:lang='en'>This is <x:def xlf:totrans='yes'><x:term>big</x:term><x:pron xlf:totrans='no'>'big</x:pron></x:def></source> As seen a generic tool: <source xml:lang='en'>This is <g id='0'><g id='1'>big</g> <ph id='2'>'big</ph></g></source> Saved by a generic tool: <source xml:lang='en'>This is <x:def><x:term>big</x:term><x:pron>'big</x:pron></x:def> </source> This is more flexible since it allows to specify how to process things. However, it may not always doable as the extension elements may belong to a namespace that does not allow extension itself, so you would not be able to use xlf:totrans (or wahtever flag decided on). For that the solution would be to use an <extend> XLIFF element as Matt (I think) suggested. But as you can imagine this would start to make the <source>/<target> content rather crowded. ============================================================ Personnal opinion ------------------------------------------------------------ It seems that allowing to extension elements that can have either translatable or "code" content in <source>/<target> would add a significate cost in processing and complexity, while I'm not sure allowing code content (i.e. meta-data) would be wise anyway. I see no big problem with the <html:b>-type of extensions as they are simply a more customized way of using <mrk> and "generic" tools could probably deal with them without to much change in their implementation. So, I tend to like solution #3 better (at least for now). -yves
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]