ubl-dev message

Subject: Re: [ubl-dev] Simple description of XML-Spreadsheet format

From: Chin Chee-Kai <cheekai@softml.net>
To: "David RR Webber (XML)" <david@drrw.info>
Date: Fri, 19 Sep 2008 11:33:28 +0800

David RR Webber (XML) wrote:
> Chin,
>
> Actually the reverse process is a little more involved.
>   
Yes, David, glad you and I agree on that.  This "more involved" area is 
where I'm working on with UBLish.

> The Excel format is actually in XML these days - and it is relatively
> simple <cell> based.
>   
Very much so, but not unless you're fully cognizant of 6,000+ pages of 
documentation before having full confidence in producing a 
Excel-compliant format.  It does serve the purpose, but would sound 
similar to using an elephant to pull a shopping trolley.  All we're 
doing here with the simple XML-Spreadsheet format is to liberate the 
contents of spreadsheet, not so much as to achieve simultaneously 
backward-compatibility with older versions of Excel (eg. 
non-ISO-compliant date format), operate properly with Macros, ensure 
cross-sheet references are maintained, transport binaries properly 
within, etc etc. (Note that OOXML is not just an XML version of Excel, 
but also a storage medium, a presentation style container, a 
programmable language via macros, a set of links and interfaces via 
plug-ins, and others.)
> BUT - you have to "know" where the cell content is that you want as
> there is a lot of other "fluff" in the raw Excel file.
>   
The "fluff" is relative.  There's a need for implementing a lot of 
structures (elements, attributes, etc) to make XML version of Excel work 
properly and serve as many users in the world as possible.  So from 
Excel's point of view, there's no fluff but requirements.  But from 
simpler applications like what we're trying to do here (just output 
content), there's no need to  do so many things.

The "know" part for XML-Spreadsheet is just row & column numbers.  I've 
tried an XPath example to extract from the XML-Spreadsheet version (to 
be released) of UBL v1.0 IDD spreadsheet.  To get row 28, column 6 of 
text , one can use the XPath expression:

     xpath = "//Row[@rowIndex='28']/Column[@columnIndex='6']/text()";

A short Kopio program would look like:

    file = "wd-UBL-1(1).0-IDD-2.xml";
    bs = FileContentAsBytes(file);
    xml = ToXML(bs);
    xpath = "//Row[@rowIndex='28']/Column[@columnIndex='6']/text()";
    val = xml.("~"+xpath);        ### Returns a (1-element) array of values
    print("Value is = ", val[0], "\n");
    GetLine();

    [Output:  ABIE]


> Not impossible though to build XSLT that will run generically against an
> Excel XML format and extract a default set of content from it.
> Especially if you pass in parameters to the XSLT of the cell offsets to
> pull data from...or have it pick up headers from the first row of cells.
>   
Technically, yes, should be mostly straightforward once one gets 
familiar with OOXML.  You'd also need to urge many users to upgrade to 
Office 2007 so that they could convert existing Excel 97-2003 
spreadsheets into  .xlsx format.  You might also encourage users to be 
familiar with how to save quickly, especially when working on a 
directory of 30 over files in the main document directory, or even more 
files while they experiment with customized spreadsheets.


Regards,
Chin Chee-Kai

References:
- RE: [ubl-dev] Simple description of XML-Spreadsheet format
  - From: "David RR Webber \(XML\)" <david@drrw.info>