OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

office message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Proposal for script attributes and language tags


Hi,

Here comes my proposal for script attributes and language tags.

Note: whenever I mention RFC 4646 in the following, the draft 4646bis
should be considered as a successor, planned to become effective as soon
as ISO/FDIS 639-3 will be accepted as a full ISO standard, see
http://www.ietf.org/internet-drafts/draft-ietf-ltru-4646bis-02.txt
especially "8.  Changes from RFC 4646". In case a newer draft is
available see
http://www.ietf.org/html.charters/ltru-charter.html
"Tags for Identifying Languages".

To be able to support the full range of language/script/country
combinations I propose:

The description of *:language attributes shall not only refer
"7.9.2 language" of [XSL]
http://www.w3.org/TR/2001/REC-xsl-20011015/slice7.html#language
but explicitly also allow ISO 639-3 three letter codes if no 639-1 or
639-2 codes are assigned for a particular language, which is not covered
by the language-specifier of RFC 3066 that is referred in [XSL] 7.9.2.

Add optional *:script attributes to all places where currently
*:language and *:country attributes are defined. The value of the
*:script attribute shall be according to
http://www.w3.org/TR/2001/REC-xsl-20011015/slice7.html#script
the ISO 15924 four letter script code. The *:script attribute should be
written only if necessary according to the rules of RFC 4646 section
"2.2.3.  Script Subtag" paragraph 5.
http://tools.ietf.org/html/rfc4646#section-2.2.3

The script attribute's schema would be

<optional>
  <attribute name="fo:script">
    <ref name="scriptCode"/>
  </attribute>
</optional>

respectively table:script and number:script where appropriate. The
places affected are:

- "7.8.1 Alphabetical Index Source", the fo:language and fo:country of
  text-alphabetical-index-source-attrs

- "8.6.5 Sort", table:language and table:country of table-sort-attlist

- "14.7.2 Currency Style", number:language and number:country of
  number-currency-symbol-attlist

- "14.7.9 Common Data Style Attributes", number:language and
  number:country of common-data-style-attlist

- "14.9.3 Bibliography Configuration", fo:language and fo:country of
  text-bibliography-configuration-attlist

- "15.4.23 Language" fo:language and "15.4.24 Country" fo:country


Furthermore, to be able to support dialects and variants and extensions
that are not expressible with the combination of the three
language/script/country attributes, I propose:

Add optional *:rfc-language-tag attributes to all places mentioned
above. This attribute, when present, shall override the *:language
*:script *:country attributes and is only to be written if the value
could not be expressed as a valid combination of those. The value shall
be a string according to the rules of RFC 4646 (4646bis). If
appropriate, for example in the case of a dialect to provide a fall-back
for applications that don't support the *:rfc-language-tag attribute,
applications should write language/script/country attributes that come
as close as possible to the actual value of the rfc-language-tag
attribute.

The rfc-language-tag attribute's schema would be

<optional>
  <attribute name="fo:rfc-language-tag">
    <ref name="RFClanguageTag"/>
  </attribute>
</optional>

respectively table:rfc-language-tag and number:rfc-language-tag where
appropriate.


Furthermore, to be able to store the same values as the default document
language I propose:

Change

- "3.1.15 Language", the metadata <dc:language> element

to allow a language-tag according to RFC 4646 (4646bis). Effectively
this means to change the paragraph containing

| The manner in which the language is represented is similar to the
| language tag described in [RFC3066]. It consists of a two or three
| letter Language Code taken from the ISO 639 standard optionally followed
| by a hyphen (-) and a two-letter Country Code taken from the ISO 3166
| standard.

to

The manner in which the language is represented is a language tag as
described in [RFC4646 (4646bis)].

I suggest to not repeat the syntax of a RFC 4646 language tag in the
description.

  Eike

-- 
 OpenOffice.org Engineering at Sun: http://blogs.sun.com/GullFOSS


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]