OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

office-comment message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]

Subject: RE: [office-comment] shorter XML representations for the values ?

There are efforts for more-compact XML and even binary encodings of XML.  (Just as ASN.1 has an XML expression, apparently the reverse is desired as well?)

However, recall that for any substantial documents, the XML in ODF packages is compressed.  While that is not a panacea, it makes the redundancies in text compressible.

I, for one, think that matching the XML Schema datatypes is a great idea and I am a bit startled that it was not applied in this case.  The Relax NG schema for ODF already appeals to those datatypes so I wonder how this was missed.

So the change for Boolean would be achievable simply by altering the schema pattern

  <define name="boolean">

to be

  <define name="boolean">
      <data type="boolean"/>

The problem is with down-level implementations.  That is, wanting ODF 1.2 consumers to be able to accept ODF 1.3 documents where there is no difference in the features being used.  This also applies pretty much for ODF 1.1 consumers too, many of which remain in use to this day.  (I have stopped being surprised that there are a large number of people still running OpenOffice.org 3.4.1, apparently an oldie but goodie, and wondering if it is safe for them to upgrade to one of the 4.x series on the computer they have been using all that time.)

So the trade-off between compressibility and uncompressed compactness has to be looked at from an interoperability perspective with regard to legacy consumers on legacy computers [;<).

By the way, if you want to make a bigger hit on uncompressed compactness, change the prefixes for namespace bindings from such things as "office", "text", "table", "draw", "presentation", "manifest" and such to "o", "t", "tb", "d", "p" and "m" or whatever in the XML that is produced.  That can be done by a producer without requiring any change to the specification at all and consumers are expected to be fine with it already.  As an experiment, see what difference that does or does not make on the compressed size of the XML too.

Thanks for the thoughtful suggestion.  I also wonder about the energy issue and use of standards at this level in additional comments below.

 -- replying below to --
From: Jérôme Bouat [mailto:jerome.bouat@wanadoo.fr] 
Sent: Saturday, January 17, 2015 06:52
To: office-comment@lists.oasis-open.org
Subject: Re: [office-comment] shorter XML representations for the values ?


> One solution for the boolean issue would be to harmonize
> our office:value-type attribute with XML Schema datatypes,
> at least for the common overlap in types.
> XML Schema's boolean type allows lexical forms to be one of:
>true, false, 1, 0.   That would allow a more compact form.

Do you know if the next specification will take this efficient boolean values representation into account ?

[ ... ]

I don't think that standardisation is the end of the innovation. If you have enough time to make the "off-the-shelf" tools compliant with the new specification, then a shorter representation of values would be a benefit for everyone.

I don't think the binary encoding would be a solution for long term storage. As you said, XML provide benefits like validation, etc.

I think a shorter value representation is a good trade-off between the use of the generic XML language/tools and the need of efficiency.

If you think about the increasing cost of the energy, then a more compact XML would be a benefit in the data centers, desktop computers, etc.

   When I worked for Bob Bemer (aka the father of ASCII) back in the day, 
   he was known for this interesting observation: 

   "Standards are arbitrary solutions to recurring problems."  

   The idea is not to then introduce new recurring problems.  
   So innovation via hammering on a voluntary standard is not 
   always a productive notion.  These standards do *not* compel
   compliance.  Purchasing requirements can, but I don't 
   imaging we'll ever see preference for 0 and 1 over true and
   false being a "rider" in a procurement requirement for ODF-
   Compliant software.  There are far bigger issues that don't
   get dealt with at procurement already.

   I have no idea how to address energy trade-offs by tweaking XML
   representations, but I think one would prefer to go after the
   big-win low-hanging fruit first.  We may be micro-optimizing 
   when macro-optimization has more to yield.  

   I think we need
   more quantitative analysis.  We've assumed, for example that
   compression and decompression save enough in input-output and
   storage to be important as a practical matter.  But we can't
   neglect the cost attributable to breaking changes and the 
   processing cost of compression/decompression (although I think
   modern processors provide assistance in this area).  

   It would
   be interesting if there were some sort of energy-footprint,
   carbon trade-off standard that could be used for procurement.
   It would have to provide some quantitative metrics
   for determination of software and file-format compliance.
   I'm not confident that such a thing is feasible and I wonder
   about the wasted energy in attempting to specify such a 
   thing [;<).

   I'm thinking that improved energy consumption and loss
   in processors, storage, and power management provide much
   greater advantage, at a faster pace, than what we are looking at
   in terms of XML.  But as I say, there are ways to reduce
   the XML without breaking any standard-compliant down-level


This publicly archived list offers a means to provide input to the
OASIS Open Document Format for Office Applications (OpenDocument) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: office-comment-subscribe@lists.oasis-open.org
Unsubscribe: office-comment-unsubscribe@lists.oasis-open.org
List help: office-comment-help@lists.oasis-open.org
List archive: http://lists.oasis-open.org/archives/office-comment/
Feedback License: http://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: http://www.oasis-open.org/maillists/guidelines.php
Committee: http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=office
Join OASIS: http://www.oasis-open.org/join/

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]