OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

office message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]

Subject: db:encoding


The db:encoding attribute is supposed to specify the "text encoding of 
string data."

Quite naturally it defaults to UTF-8 but value of this attribute is a 
ref to "textEncoding."

Ok, so being curious I run that down and find:

<define name="textEncoding">
    <data type="string">
        <param name="pattern">[A-Za-z][A-Za-z0-9._\-]*</param>

(not anywhere near the attribute if you are thinking the schema fragment 
would be helpful)

Well, utf-8 does match that pattern but it is hardly what I would call a 
basis for validation.

*Be aware* that textEncoding is also used for:

        <attribute name="style:font-charset">
            <ref name="textEncoding"/>
        <attribute name="style:font-charset-asian">
            <ref name="textEncoding"/>
        <attribute name="style:font-charset-complex">
            <ref name="textEncoding"/>

So we probably can't really just "fix" it.

My preference, subject to the wishes of the TC, would be as follows:

1) Separate out the notion of encoding from anything else, including 
styles for character sets.

2) Specify by normative reference recognized character encodings *and* 
specify that conforming applications need only support UTF-8, although 
other encodings may be supported. For maximum interoperability, 
applications are recommended to use only UTF-8 and if necessary, to 
replicate text in other encodings. (This would be a good use for frames.)

3) Reform all references to styles to not simply be strings but to be 
xml:idrefs to xml:ids that identify styles. Give styles display names 
but let's fix all the identification and reference mechanisms before the 
ODF audience doubles or triples. There are known and well documented XML 
techniques for this sort of thing and using a variety of mechanisms just 
increases the burden on implementers and increases the likelihood of 
varying results between applications.

Yes, I am aware that this would mean some increase burden on current 
implementers. But, I would rather do that now than to have to maintain 
multiple identification/reference systems as ODF grows.

Hope everyone is having a great day!


Patrick Durusau
Chair, V1 - US TAG to JTC 1/SC 34
Convener, JTC 1/SC 34/WG 3 (Topic Maps)
Editor, OpenDocument Format TC (OASIS), Project Editor ISO/IEC 26300
Co-Editor, ISO/IEC 13250-1, 13250-5 (Topic Maps)

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]