OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

office-comment message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]

Subject: shorter XML representations for the values ?


The 1.2 version of Open Document Format for Office Applications specifies how the datatypes are represented into the part 1, section 18.

This current representation of values is sometimes verbose. For example the "false" characters list represents the false boolean value (40 bits with UTF-8 in order to represent 1 bit value). I understand that those representations provide human readable values but I think we could use a shorter representation that still be readable by a human. For example, the "F" or "0" characters are still readable when they represent the false boolean value.

Even if the size of the compressed "content.xml" member would not decrease a lot, the current representation infers more bytes to be read/written to memory when compressing/uncompressing and more bytes to be parsed by the application when loading a data value into its internal representation.

For example, as discussed above, a boolean is represented as below :
<table:table-cell table:style-name="ce1" office:value-type="boolean" office:boolean-value="false">

Could we possibly use a shorter representation for the "office:boolean-value" attribute like "T" or "1" for the true value and "F" or "0" for the false value ?

For example, a floating point number is represented as below :
<table:table-cell office:value-type="float" office:value="123456789012345">

By using a base 62 representation (symbols by increasing weight : 0-9 letters, a-z letters, A-Z letters), the value of the "office:value" attribute becomes "z3wBXdvb". The size of this base 62 representation is roughly the half of the size of its base 10 representation. The application will have roughly half bytes less to process in order to load the number into its internal representation. This would possibly increase the performance of applications when reading/writing large files.

Could the next specifications possibly shorten the length of the XML characters lists which represents the values ?

Are there any stoppers which prevents this change to be performed in the next versions ?


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]