OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

dita-translation message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: DITA Subcommittee Meeting -- 27 March 2006


Hello All,

Agenda for Monday 27 March 2006
11:00 am - 12:00 am Eastern Standard Team (-5 GMT)
DITA Technical Committtee teleconference 
USA Toll Free Number: 866-566-4838 
USA Toll Number: +1-210-280-1707 
PASSCODE: 185771

Roll Call

Approve Minutes from 20 March 2006 (enclosed for those who are not TC
members)
http://www.oasis-open.org/apps/org/workgroup/dita-translation/download.p
hp/17321/SCmeeting060320.txt


Announcement: Many thanks to Andrzej Zydron for his presentating of the
workflow for DITA + xml:tm

New Business

1) Proposal for the xml:lang attribute specification -- presentation by
Gershon
http://www.oasis-open.org/apps/org/workgroup/dita-translation/email/arch
ives/200603/msg00042.html

2) Proposal for the dir attribute specification -- presentation by
Gershon
http://www.oasis-open.org/apps/org/workgroup/dita-translation/email/arch
ives/200603/msg00043.html
 
 

JoAnn 

JoAnn T. Hackos, PhD 
President 
Comtech Services, Inc. 
710 Kipling Street, Suite 400 
Denver, CO 80215 
303-232-7586 
joann.hackos@comtech-serv.com 
http://www.comtech-serv.com <http://www.comtech-serv.com/>  
Skype joannhackos


--- Begin Message ---


Best Regards,
Gershon

---
Gershon L Joseph
Member, OASIS DITA and DocBook Technical Committees
Director of Technology and Single Sourcing
Tech-Tav Documentation Ltd.
office: +972-8-974-1569
mobile: +972-57-314-1170
http://www.tech-tav.com


MEETING MINUTES -- 20 March 2006 -- DITA TRANSLATION SUBCOMMITTEE
(Minutes taken by Gershon Joseph <gershon@tech-tav.com>)

Date: Monday, 20 March 2006
Time: 08:00 - 09:00 PST

DITA Translation Subcommittee resources:
- SC Web site:
    http://www.oasis-open.org/apps/org/workgroup/dita-translation/index.php
- Mailing list: dita-translation@lists.oasis-open.org
- Non-OASIS members please email Gershon or Don and we'll post on
    your behalf

Roll Call
- Present: JoAnn, Rodolfo, Andrzej, Gershon, Kevin, Robert, Charles, Don, 
	   Felix, Patrick, Bruce, Nancy
- Regrets:
 
Review/approve minutes from previous meeting (13 March 2006)
- http://lists.oasis-open.org/archives/dita-translation/200603/msg00033.html
- Correction to the minutes:
    Felix Sasaki's action concerns the ITS working draft in general, on 
    which the ITS working group would like to get feedback. He will send 
    in the working draft within the next weeks.
- Andrzej moves to accept the minutes as amended, Don accepts.
 
Announcement: The three proposals of the SC were approved by the OASIS DITA TC 
    on 14 March 2006. Thank you all for your contributions.

New Business

- Review Andrzej Zydron's workflow proposal
    - Andrzej presents proposal...
    - Don -- Asks for a version of the lifecycle without xml-tm.
    - Rodolfo -- see XLIFF-based white papers on Web; don't need CMS, just 
        transform XML to XLIFF, apply XML skeleton, translate, etc. 
	Similar cycle to what Andrzej presented, just without xml-tm.
	Can still get exact matches via the TM DB.
    --ACTION ITEM-- Rodolfo to send link to lifecycle white papers to SC
    - [Discussion about whether CMS is required for xml-tm]
        Could any file system that supports versioning work?
    - Andrzej -- Yes, but if you're serious about DITA you need version 
        control and lifecycle tracking. Loose so much in terms of control 
	without CMS. xml-tm needs some form of version control, but does not 
	require full-blown CMS.

- Begin the discussions of the information to be included in the DITA 
    Architecture Specification 1.1 and in the Best Practice recommendations:
    a) xml:lang 
    b) dir attribute
    - Gershon presented overview of the open issues regarding best practices
        for xml:lang and dir attributes. Summary: prefer markup via these 
	attributes (in suitable phrase or block element, as relevant) over 
	using Unicode markers to indicate language and directionality 
	boundaries.
    - Bruce -- How do you handle email discussion [relating to output, 
        particularly top-to-bottom type flows]? XML with embedded Unicode 
	markers? How do we specify directionality in output?
    - Don -- xml:lang should be used for that.
    - Bruce -- Should authors provide directionality [top-bottom, not only 
        left-right]?
    - Don -- No, W3C saw need only for dir attribute, so it should work for 
        DITA too. No need for DITA to resolve vertical direction, since no 
	other source standards address it at this time.
    --ACTION ITEM-- Gershon to update his original xml:lang doc and send to group
    --ACTION ITEM-- Gershon to update his original dir attribute doc and 
        send to Kevin who will add comments related to output.

Meeting adjourned.
--- End Message ---
--- Begin Message ---
Hi all,

Please review my updated proposal. I've added best practice sections for
users and vendors/implementers.

Kevin will be adding a section on output/rendering expectations and best
practice.

We would like to finalize and approve our dir attribute proposal for
submission to the DITA TC during Monday's SC meeting. Please post your
feedback to the list.

Thanks,
Gershon

---
Gershon L Joseph
Member, OASIS DITA and DocBook Technical Committees
Director of Technology and Single Sourcing
Tech-Tav Documentation Ltd.
office: +972-8-974-1569
mobile: +972-57-314-1170
http://www.tech-tav.com


Title: Dir Attribute Proposal

Dir Attribute Proposal


Background

While most languages are written in text where characters flow from left to right, Hebrew and many Arabic languages are written from right to left. In some languages, including Hebrew and Arabic, numbers and other content is written left to right. Also, a multilingual document containing, for example, English and Hebrew, contains some text that flows left to right and other text that flows right to left.

Text directionality is controlled by the following:

  1. xml:lang attribute on the document element or, if not specified, default language assumed by the processor. Directionality is determined by the Unicode bidirectional algorithm for this language.

  2. xml:lang attribute on any element that overrides the inherited language. Again, directionality is determined by the Unicode bidirectional algorithm for the specified language.

  3. dir="ltr|rtl" attribute on an element that overrides the inherited direction (as determined by dir on a parent element or either specified or inferred xml:lang on a parent element). The specified direction overrides the Unicode bidirectional algorithm only on neutral Unicode characters (e.g. spaces and punctuation) in the element's content.

  4. dir="lro|rlo" attribute on an element. The specified direction overrides the Unicode bidirectional algorithm on all Unicode characters in the element's content.

In most cases, authors need to use dir="rtl|ltr" to ensure punctuation surrounding a RTL phrase inside a LTR element is rendered correctly. In order to override the direction of strongly typed Unicode characters (most characters that apply to a language except for punctuation, spaces and digits), the author would need to use dir="lro|rlo". The use of the dir attribute and the Unicode algorithm is clearly explained in the article [REF 1]. The referenced article has several examples on the use of dir="rtl|ltr". There is no example on the use of dir="lro|rlo", though it can be inferred from the example using the bdo element (the old W3C way of overriding the entire Unicode bidirectional algorithm; the now favor using the override values on the dir attribute).

Text direction cannot be sufficiently specified by the xml:lang attribute alone, because numeric and punctuation characters are input, and rendered, according to the Unicode bidirectional algorithm, which often cannot correctly determine the correct direction of the characters.

From the HTML 4.0 spec:

The dir attribute specifies the directionality of text: left-to-right (dir="ltr", the default) or right-to-left (dir="rtl"). Characters in Unicode are assigned a directionality, left-to-right or right-to-left, to allow the text to be rendered properly. For example, while English characters are presented left-to-right, Hebrew characters are presented right-to-left. Unicode defines a bidirectional algorithm that must be applied whenever a document contains right-to-left characters. While this algorithm usually gives the proper presentation, some situations leave directionally neutral text and require the dir attribute to specify the base directionality. Text is often directionally neutral when there are multiple embeddings of content with a different directionality. For example, an English sentence that contains a Hebrew phrase that contains an English quotation would require the dir attribute to define the directionality of the Hebrew phrase. The Hebrew phrase, including the English quotation, would be contained within a ph element with dir="rtl".

Specification changes

Add a new attribute called "dir", as follows:

dir="ltr|rtl|lro|rlo"

This attribute, when set to "ltr" or "rtl", overrides the default Unicode bidirectional algorithm on neutral characters (such as spaces and punctuation). These values are usually used to ensure punctuation is applied correctly in a phrase.

This attribute, when set to "lro" or "rlo", overrides the default Unicode bidirectional algorithm on all characters. These values are usually used to force a direction on all characters contained in a phrase.

This attribute is usually used in conjunction with the xml:lang attribute, to override the default Unicode bidirectional algorithm that applies to the specified language.

This attribute is available on all elements within DITA.

Additional rules to be documented:

  • When the dir attribute is set on an element, it remains in effect for the duration of the element and all child elements. Setting the dir attribute on a nested element overrides the inherited value.

  • If the document element does not specify the dir attribute, then if the document element specifies the xml:lang attribute, the Unicode Bidirectional Algorithm must be applied to the specified language. If neither xml:lang nor dir attributes are set on the document element, the processor must assume a language and the direction must be inferred from the Unicode Bidirectional Algorithm applied to the default language.

  • The dir attribute can also be used to specify the direction of non-textual content, such as tables and lists. In the case of <table dir="rtl">, the columns flow from right to left. In the case of <ul dir="rtl"> or <ol dir="rtl">, the list decoration (bullets or numbers) appear on the right of the screen/page and the <li> content flows from right to left.

Example:

<p dir="ltr">
The Hebrew word for "Hebrew" is <ph xml:lang="he-il">עברית</ph>,
but since Hebrew letters have intrinsic right-to-left directionality,
I had to type the word starting from the letter "ע",
i.e. <ph xml:lang="he-il" dir="lro">תירבע</ph>.
</p>

Many good examples are provided in [REF 1].

While many of the issues can be resolved using the so-called Unicode control characters (hidden characters with strong directionality of either LTR or RTL), the W3C discourages use of the control characters (see [REF 1]). Our documentation of the dir attribute should probably include something like "When directionality issues can be resolved by either use of the dir attribute or use of Unicode control characters (LRM, RLM) , use of the dir attribute is strongly recommended."

Recommended Usage

The Unicode Bidirectional algorithm provides for various levels of bidirectionality, as follows:

  1. Directionality is inferred from the xml:lang value. Every language has an associated directionality (left-to-right or right-to-left, also termed LTR or RTL). For example, for English this default direction is LTR and for Hebrew it's RTL.

  2. When embedding a RTL text run inside a LTR text run (or vice-verse), the default direction often provides incorrect results, especially if the embedded text run includes punctuation that is located at one end of the embedded text run. Unicode defines spaces and punctuation as having neutral directionality, and defines directionality for these neutral characters when they appear between characters having a strong directionality (most characters that are not spaces or punctuation). While the default direction is often sufficient to determine the correct directionality of the language, sometimes it renders the characters incorrectly (for example, a question mark at the end of a Hebrew question may appear at the beginning of the question instead of at the end). To control this behavior, the dir attribute is set to "ltr" or "rtl" as needed, to ensure that the desired direction is applied to the characters that have neutral bidirectionality. The "ltr|rtl" values override only the neutral characters, not all Unicode characters.

  3. Sometimes you may want to override the default directionality for strongly bidirectional characters. This is done using the "lro" and "rlo" values, which overrides the Unicode directionality algorithm. This essentially forces a direction on the contents of the element, ignoring the direction interpreted from any xml:lang setting. These override attributes give the author a brute force way of setting the directionality independently of the Unicode BIDI algorithm. The gentler "ltr|rtl" values have a less radical effect, only effecting punctuation and other so-called neutral characters.

For most authoring needs, the "ltr" and "rtl" values are sufficient. Only when the desired effect cannot be achieved using these values, should the override values be used.

While the Unicode standard includes hidden markers for directionality without the need for markup, these markers should not be used. It is strongly recommended to mark up the document using the dir attribute to set directionality. Using markup instead of the Unicode markers has the following advantages:

  • The document will be as portable as possible.

  • The document can be processed by applications that do not fully implement the Unicode BIDI algorithm.

  • The marked-up document can be read and understood by humans.

  • When updating the document, the boundaries of each text flow are clear, which makes it much easier for the author to update the document.

Note to Vendors/Implementors

Applications that process DITA documents, whether at the authoring, translation, publishing, or any other stage, should fully support the Unicode algorithm to correctly implement the script and directionality for each language used in the document. The recommended practice is to write all directionality markers via XML markup and not to use the Unicode Bidirectional markers. When reading XML markup that embeds the Unicode Bidirectional markers, these markers should be replaced with markup when the document is saved.

--- End Message ---
--- Begin Message ---
Hi all,

Please review my updated proposal. I've added best practice sections for
users and vendors/implementers.

We would like to finalize and approve our xml:lang proposal for submission
to the DITA TC during Monday's SC meeting. Please post your feedback to the
list.

Thanks,
Gershon

---
Gershon L Joseph
Member, OASIS DITA and DocBook Technical Committees
Director of Technology and Single Sourcing
Tech-Tav Documentation Ltd.
office: +972-8-974-1569
mobile: +972-57-314-1170
http://www.tech-tav.com


Title: Proposal for xml:lang Attribute

Proposal for xml:lang Attribute


Name

xml:lang

Description

Specifies the language and locale of the element content. The intent declared with xml:lang is considered to apply to all attributes and content of the element where it is specified, unless overridden with an instance of xml:lang on another element within that content. When no xml:lang value is supplied, the processor should assume a default value.

This attribute must be set to a language identifier, as defined by IETF RFC 3066 (http://www.ietf.org/rfc/rfc3066.txt) or successor.

Data Type

NMTOKEN

Default Value

Not set

Required

#IMPLIED

Recommended Usage

For a DITA document that contains a single language, the document element should always set the xml:lang attribute to the language (and optionally locale) that applies to the document.

For a DITA document that contains more than one language, the document element should always set the xml:lang attribute to the primary language (and optionally locale) that applies to the document. Wherever an alternate language occurs in the document, the element containing text in the alternate language should set the xml:lang attribute appropriately. The above way of overriding the default document language applies to both block and inline elements that use the alternate language.

While the Unicode standard provides for all languages to be encoded without the need for markup, using markup is strongly recommended to make the document as portable as possible. By using markup, the document can be processed by applications that do not fully implement the Unicode standard. In addition, the marked-up document can be read and understood by humans. Finally, when updating the document, the boundaries of each language are clear, which makes it much easier for the author to update the document.

Note to Vendors/Implementors

Applications that process DITA documents, whether at the authoring, translation, publishing, or any other stage, should fully support the Unicode algorithm to correctly implement the script and directionality for each language used in the document. The recommended practice is to identify every change in language via XML markup. When reading XML markup that embeds the Unicode script information (that is, a change in language), the embedded languages should be indicated via markup when the document is saved.

--- End Message ---


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]