OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

xliff-inline message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: [xliff-inline] BiDi draft for discussion


Hi All,

As promised on the last conference call I have created a first draft of schema extensions for BiDi support in Xliff 2.0. I mostly follow the design for HTML5 and extend it to bi-lingual files. I'm not sure we want to adopt the 'auto' direction so I left it out for now. See http://www.whatwg.org/specs/web-apps/current-work/#the-dir-attribute for discussion of it.

We need to be able to inform the processing applications what base text directions a specific piece of text in an XLIFF document has. In most cases all source or target texts within a document has the same base direction. Or in other cases the direction is set for a specific unit or segment. Since XLIFF files are bi-lingual by nature it is natural to provide this separately for source and target languages on the structural elements and then restrict it to a single value on the <source> and <target> level. Since this also include changes to the structural parts we should bring this to the TC once we have agreed on the semantics on the inline level.

<source> inherits the direction of its parent containers 'source-dir' and <target> inherits its 'target-dir'. All inline codes except <mrk> default to Left-To-Right for the native display direction, disp-dir, as most markup is designed as LTR. The native code direction is distinct from the general text direction. This will allow sensible rendering of for example XML elements embedded in Right-To-Left text by default. <mrk> is often used for comments / annotations and such it makes more sense to inherit the direction from the container, but I'm not sure this is the right place to define it. The <pc> and <sc> inline elements inherit the direction of their container element and that direction is employed as embedding direction for the span. The <ec> is ending the embedding started by its corresponding <sc> tag.

Direction = {ltr, rtl}

Attributes 'source-dir' and 'target-dir' added to the following tags with a default value of 'LTR':
<file>

Attributes 'source-dir' and 'target-dir' added to the following tags with a default value inherited from the closest parent container that can hold the attributes:
<unit>
<segment>
<ignorable>

Attribute 'dir' added to the following tag with a default value inherited from the closest parent container that can have the 'source-dir':
 <source>

Attribute 'dir' added to the following tag with a default value inherited from the closest parent container that can have the 'target-dir' attribute:
<target>

Attribute 'dir' added to the following tags with a default value inherited from the closest parent container that can hold the attribute:
<pc>
<sc>

Attribute disp-dir added to the following tags with default 'LTR':
<cp> (not really useful here might be better to always have LTR dir, but perhaps for consistency)
<ph>
<pc>
<sc>
<ec>
<mrk> (not sure this is the place, perhaps we should move this to the content defining the text of the marker)

This should allow the direction to be specified as few times as possible and if nothing is specified both source and target is LTR. If all content in source or target is RTL the direction only need to be specified once in the file. Note that <mrk> cannot be used to specify a different direction for a span. In my opinion <mrk> should only be used for annotations and not to influence the back conversion process.

In addition to these attribute based directional markers we should allow the use of Unicode directional characters in the text flow. This is more convenient when a translator is entering text. It will be up to the back-conversion from XLIFF to native format to keep, remove or replace them.

Further work on defining the mapping of these attributes onto the Unicode Directional Algorithm ( http://unicode.org/reports/tr9/ ) is needed. I would propose that we treat the <unit> as terminating a paragraph and resetting the embedding state on the <unit> boundary. From a quick study of this I think it makes most sense if we use the direction set on the <unit> as the default text direction. And if a segment has a direction specified (even if it is the default direction) start an embedding run. Native codes should be displayed within a push / pop override if their direction is different from the default direction. Spans inside the segment (<pc> or <sc>+<ec>) would use embedding if a direction is specified. I'm not an expert on the Unicode directional algorithm so my mapping might not be the best or even the proper thing. What I try to achieve is that nothing special should need to happen for the most common case of only LTR text. Also a minimum of overrides / embedding should be done for all RTL text in either source or target.

There are two issues left to resolve in my opinion: How the <mrk> should work, and how to add a directional span to the target that does not exist in source. For example when a product name / trademark is copied from a LTR source into a RTL target it may need to be protected by a span if it is starting or ending in directionally neutral characters. One option would be adopting the <bdo> element from HTML5. Or simply rely on Unicode characters for this.

Regards,
Fredrik Estreen



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]