office-comment message
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
| [List Home]
Subject: Re: [office-comment] shorter XML representations for the values ?
- From: robert_weir@us.ibm.com
- To: Jérôme Bouat <jerome.bouat@wanadoo.fr>
- Date: Mon, 5 Jan 2015 09:30:17 -0500
Jérôme Bouat <jerome.bouat@wanadoo.fr> wrote on
01/04/2015 05:16:28 AM:
> From: Jérôme Bouat <jerome.bouat@wanadoo.fr>
> To: office-comment@lists.oasis-open.org
> Date: 01/04/2015 05:16 AM
> Subject: [office-comment] shorter XML representations
for the values ?
>
> Hello,
>
>
> The 1.2 version of Open Document Format for Office Applications
> specifies how the datatypes are represented into the part 1, section
18.
>
> This current representation of values is sometimes verbose. For
> example the "false" characters list represents the false
boolean
> value (40 bits with UTF-8 in order to represent 1 bit value). I
> understand that those representations provide human readable values
> but I think we could use a shorter representation that still be
> readable by a human. For example, the "F" or "0"
characters are
> still readable when they represent the false boolean value.
>
> Even if the size of the compressed "content.xml" member
would not
> decrease a lot, the current representation infers more bytes to be
> read/written to memory when compressing/uncompressing and more bytes
> to be parsed by the application when loading a data value into its
> internal representation.
>
>
> For example, as discussed above, a boolean is represented as below
:
> ---
> <table:table-cell table:style-name="ce1" office:value-type="boolean"
> office:boolean-value="false">
> <text:p>FAUX</text:p>
> </table:table-cell>
> ---
>
> Could we possibly use a shorter representation for the
> "office:boolean-value" attribute like "T" or "1"
for the true value
> and "F" or "0" for the false value ?
>
Hello Jérôme,
Thanks for writing.
One solution for the boolean issue would be to harmonize
our office:value-type attribute with XML Schema datatypes, at least for
the common overlap in types. XML Schema's boolean type allows lexical
forms to be one of: true, false, 1, 0. That would allow a more compact
form.
>
> For example, a floating point number is represented as below :
> ---
> <table:table-cell office:value-type="float" office:value="123456789012345">
> <text:p>123456789012345</text:p>
> </table:table-cell>
> ---
>
> By using a base 62 representation (symbols by increasing weight :
> 0-9 letters, a-z letters, A-Z letters), the value of the
> "office:value" attribute becomes "z3wBXdvb". The
size of this base
> 62 representation is roughly the half of the size of its base 10
> representation. The application will have roughly half bytes less
to
> process in order to load the number into its internal
> representation. This would possibly increase the performance of
> applications when reading/writing large files.
>
>
That would add considerable complexity on ODF processors,
including byte-order concerns. The nice thing about using XML Schema
datatypes is that they are well known and supported in tools. In
particular, validating parsers can apply additional constraints. So
a use could easily write a script, using just off-the-shelf XML tools,
to confirm that all cells in an ODF spreadsheet have values between -50
and 1000. But if values are encoded like "z3wBXdvb" then
it would require custom coding to make sense of that value.
One way to think of this: adherence to well-known
standards provides an efficiency of its own, in terms of understanding,
compatibility with existing tools, etc. But it might not be the
optimal in terms of run-time performance. An alternative here
-- which we've talked about before -- would be to have a canonical binary
encoding of ODF. Microsoft does something similar with Excel,
having the XML-based OOXML format, but also having a specialized .xlsb
format for optimized storage of very large spreadsheets.
Regards,
-Rob
> Could the next specifications possibly shorten the length of the XML
> characters lists which represents the values ?
>
> Are there any stoppers which prevents this change to be performed
in
> the next versions ?
>
>
> Regards.
>
> --
> This publicly archived list offers a means to provide input to the
> OASIS Open Document Format for Office Applications (OpenDocument)
TC.
>
> In order to verify user consent to the Feedback License terms and
> to minimize spam in the list archive, subscription is required
> before posting.
>
> Subscribe: office-comment-subscribe@lists.oasis-open.org
> Unsubscribe: office-comment-unsubscribe@lists.oasis-open.org
> List help: office-comment-help@lists.oasis-open.org
> List archive: http://lists.oasis-open.org/archives/office-comment/
> Feedback License: http://www.oasis-open.org/who/ipr/feedback_license.pdf
> List Guidelines: http://www.oasis-open.org/maillists/guidelines.php
> Committee: http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=office
> Join OASIS: http://www.oasis-open.org/join/
>
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
| [List Home]