OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

office message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Proposal for a simplified *-asian and *-complex attribute processing


Hi,

as discussed in the last con call, the OpenOffice.org specification 
contains -asian and -complex versions of some attributes. An example for 
this is fo:font-family, where additionally an style:font-family-asian 
and style:font-family-complex exists.

The behavior of these attributes is as follows:
- the fo:font-family attribute is applied to all latin characters
- the style:font-family-asian attribute is applied to CJK characters
- the style:font-family-complex attribute is applied to CTL characters

On the one hand, this means that an application that reads files has to 
know into which class or script type (latin, CJK or CTL) a character 
belongs to apply the correct attribute to it. On the other hand, this 
means that an application that saves a file but does support only a 
single font family setting has to create all three attributes.

This does not only apply to fo:font-family, but to

- style:font-name, fo:font-family, style:font-family-generic,
   style:font-style-name, style:font-pitch, style:font-charset
   style:font-pitch
- fo:font-size, style:font-size-rel
- fo:language, fo:country
- fo:font-style, fo-font-weight

In the following, these attributes are called script dependent, and an 
application that supports these script-dependent attributes is called an 
application that supports script types.

The reason for having script dependent attributes is that it is in fact 
common to use different font and font sizes for the different script 
types and that creating documents that use multiple script types becomes 
much easier if fonts and font sizes are selected automatically based on 
the script type of the character that has been typed in. An issue 
however is that the UNICODE character set also contains "weak" 
characters that do not specify a script-type. For these characters, 
applications have to guess a type from the surrounding characters, the 
locale in use, or the user interface's input method. This means that 
applications in fact might choose different script-types for weak 
characters. This again means that documents may look different even in 
applications that both support script types.

To simplify transformations from and to CSS/XSL-FO and other formats 
that don't have script-dependent attributes, and to also solve the issue 
that applications may choose different algorithms to assign script-types 
to weak UNICODE characters, I would like to propose to add a

style:script-type=(latin|asian|complex|ignore)

formatting property. This property can be used like any other formatting 
property in styles and specifies what script dependent attributes should 
be applied to some text. The attribute has to be evaluated by 
applications that do not support script types. Application that support 
script types may (or should) also evaluate the attribute and overwrite 
the script type they would evaluate for a certain character, but they 
don't have to.

The attribute value "ignore" can be used only within a 
<style:default-style>. If it is set, all script-dependent attributes are 
applied to all script types. This would mean for example that a 
fo:font-family would be applied to all script types as well as a 
style:font-family-asian or style:font-family-complex. This simplifies 
saving documents with application that do not support a script type, 
because these applications otherwise would have to export all three 
script-dependent attributes for a single property.

Example with script-type support:

<office:document text:st="asian">

...

<style:style style:name="Text Body">
   <style:properties fo:font-family="Times"
                     style:font-family-asian="Tahoma"
                     style:script-type="asian"/>
</style:style>
<style:style style:name="T1">
   <style:properties style:script-type="latin"/>
</style:style>

...

<text:p text:style-name="Text Body">
   [asian characters]
   <text:span text:style-name="T1">[latin characters]</text:span>
   [asian characters]
</text:p>
<text:p text:style-name="Text Body">
   [asian characters]
</text:p>
<text:p text:style-name="Text Body">
   [asian characters]
   <text:span text:style-name="T1">[latin characters]</text:span>
   [asian characters]
</text:p>

The same example without script-type support:

<office:document>
...

<style:default-style>
   <style:properties style:script-type="ignore"/>
</style:style>

<style:style style:name="Text Body">
   <style:properties fo:font-family="Tahoma"/>
</style:style>
<style:style style:name="T1">
   <style:properties fo:font-family="Times"/>
</style:style>

...

<text:p text:style-name="Text Body">
   [asian characters]
   <text:span text:style-name="T1">[latin characters]</text:span>
   [asian characters]
</text:p>
<text:p text:style-name="Text Body">
   [asian characters]
</text:p>
<text:p text:style-name="Text Body">
   [asian characters]
   <text:span text:style-name="T1">[latin characters]</text:span>
   [asian characters]
</text:p>


An alternative would be to add formatting properties that specify the 
current values regardless of the script type, for instance by renaming 
the latin atributes to *-latin and by using the attributes without a 
suffix for the current value. This solves the transformation issue and 
in fact might make transformations to formats that don't have 
script-depdendent attributes even easier, but unfortunately, this 
solution would not solve the problem of weak UNICODE characters.

Best regards

Michael



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]