office message

Subject: Storing sheet-local named expressions in ODF
From: "David A. Wheeler" <dwheeler@dwheeler.com>
To: office-formula@lists.oasis-open.org, office@lists.oasis-open.org
Date: Sat, 05 Aug 2006 11:36:30 -0400 (EDT)
Some spreadsheet applications (Excel and Gnumeric at least)
allow named expressions (which are basically spreadsheet variables)
to be locally scoped to a specific sheet instead of being globally scoped.
While a vast number of spreadsheet documents don't
require local sheet names, there are documents that depend on this.

There is a workaround solution in ODF as-is, but it has several weaknesses,
so I'd like to standardize a better way to address sheet-local names.
In the current draft formula syntax, the formula SC has a syntax for directly
representing references to sheet-local names.  But for this
to be useful, there needs to be a standard way to _store_ sheet-local
named expressions in ODF documents.

Can we discuss how to handle them?  As a starting point,
I propose that in the rule <element name="table:table">,
between the last (sub)rule <ref name="table-rows-and-groups"/>,
and the closing </element>, we add this:
  <optional>
    <ref name="table-named-expressions"/>
  </optional>
This would allow a table:table to directly contain
table:named-expressions, which would
then be interpreted as sheet-local names.

Currently, element "table:named-expressions" is only defined
as being part of the epilogue of the entire spreadsheet document's
"office:spreadsheet", so spreadsheets look like this:
  <office:spreadsheet>
    <table:table table:name="Sheet1" table:style-name="ta1" ....
        (table contents, including its data)
    </table:table>
    ...
    <table:named-expressions>
      <table:named-range table:name="GETB2" table:base-cell-address="$Sheet1.$B$2"
                table:cell-range-address="$Sheet1.$B$2"/>
       <table:named-expression table:name="Makefour"
                 table:base-cell-address="$Sheet1.$A$4" table:expression="2+2"/>
    </table:named-expressions>
  </office:spreadsheet>

This would still be fine; table:named-expressions that are
children of office:spreadsheet would continue to be
considered global names.  The table:named-expressions that
are children of table:table would be sheet-local names.

Currently, since named expressions are only allowed in
the spreadsheet epilogue, the "usual" way to represent
quasi-sheet-local names in ODF is by creating
names that are a concatenation of the sheetname
and the user's original name.  That works in terms
of getting calculations correct, so no one needs to
worry that ODF "isn't ready for use". ODF _does_ work, as-is.

But this IS a little awkward.
Users have to use long names instead of the usual short names in a Sheet, e.g.,
in Sheet1 I have to refer to Sheet1_myname instead of simply "myname".
Worse, sheet copying does NOT work as expected with local names; if I copy the
sheet and call it "Sheet5", all the named references will still refer
to the variables named "Sheet1..." and will NOT automatically switch to Sheet5.
So copying such sheets is error-prone, and requires excess work
(lots of renaming). A similar problem happens when you use XML tools to extract a
particular sheet - you'd like the sheet-local named expressions to be
directly accessible.  The problem is worse if you allow subtables,
something that Excel doesn't permit but IS specifically mentioned
in the OpenDocument spec.  The workaround is fine for now, and future
applications will be able to perpetually read backwards to documents
that use the workaround, but there's no reason we can't
add improved support for the future.

A separate issue that MIGHT be relevant is a technique for
improving the performance of loading external values.
A spreadsheet formula can refer to external
spreadsheet document's names (and in ODF, cells).
An application _CAN_ load the entire external document
to get those values, but this loading process could become quite lengthy.
It would be very nice if the packaged format defined an
optional set of files that directly stored the values of
named expressions.  E.G., perhaps there's a  file named
"global-named-expressions.xml", and a subdirectory
"named-expressions" that contain files for each sheet.
Each of these would list the names of named expressions and
their values.  If the files don't exist, and a named expression is
requested, the application just accesses them the usual way;
if an application wants such external access to be faster, it just
puts them in the zip archive.  Now reading them involves unzipping
only those files. Since in my conception there is one file per sheet,
and a global file, I don't think that these issues conflict with
each other.

Allowing table:named-expressions as an optional child of table:table
seems like a reasonable way to handle sheet-local variables to me.
But I'd like to hear others' comments. This is an interface issue between
the formula subcommittee and the overall OpenDocument TC,
so I'm cross-posting to both lists.

Below are relevant extracts from the ODF spec, including the
"Spreadsheet Documents" and "Named Expressions" sections.

--- David A. Wheeler 


====================================


(From section "Spreadsheet Documents":)

The content of spreadsheet documents mainly consists of a sequence of tables. Additionally, a spreadsheet document may contain forms, change tracking information and various kinds of declarations that simplify the usage of spreadsheet tables and their analysis. Each of these are contained in either the document prelude, or the document epilogue.
<define name="office-body-content" combine="choice">
	<element name="office:spreadsheet">
		<ref name="office-spreadsheet-attlist"/>
		<ref name="office-spreadsheet-content-prelude"/>
		<ref name="office-spreadsheet-content-main"/>
		<ref name="office-spreadsheet-content-epilogue"/>
	</element>
</define>
Spreadsheet Document Content Model
The spreadsheet document prelude contains the document's form data, change tracking information, calculation setting for formulas, validation rules for cell content and declarations for label ranges.
<define name="office-spreadsheet-content-prelude">
	<optional>
		<ref name="table-tracked-changes"/>	
	</optional>
	<ref name="text-decls"/>
	<ref name="table-decls"/>
</define>

<define name="table-decls">
	<optional>
		<ref name="table-calculation-settings"/>	
	</optional>
	<optional>
		<ref name="table-content-validations"/>	
	</optional>
	<optional>
		<ref name="table-label-ranges"/>	
	</optional>
</define>
The main document is a list of tables.
<define name="office-spreadsheet-content-main">
	<zeroOrMore>
		<ref name="table-table"/>
	</zeroOrMore>
</define>
The epilogue of spreadsheet documents contains declarations for named expressions, database ranges, data pilot tables, consolidation operations and DDE links.
<define name="office-spreadsheet-content-epilogue">
	<ref name="table-functions"/>	
</define>

<define name="table-functions">
	<optional>
		<ref name="table-named-expressions"/>	
	</optional>
	<optional>
		<ref name="table-database-ranges"/>	
	</optional>
	<optional>
		<ref name="table-data-pilot-tables"/>	
	</optional>
	<optional>
		<ref name="table-consolidation"/>	
	</optional>
	<optional>
		<ref name="table-dde-links"/>	
	</optional>
</define>





(section "Table Element"):

The table element is the root element for tables.
<define name="table-table">
	<element name="table:table">
		<ref name="table-table-attlist"/>
		<optional>
			<ref name="table-table-source"/>
		</optional>
		<optional>
			<ref name="office-dde-source"/>
		</optional>
		<optional>
			<ref name="table-scenario"/>
		</optional>
		<optional>
			<ref name="office-forms"/>
		</optional>
		<optional>
			<ref name="table-shapes"/>
		</optional>
		<ref name="table-columns-and-groups"/>
		<ref name="table-rows-and-groups"/>
	</element>
</define>









(section "Named Expressions"):

The named expressions element <table:named-expressions> contains a collection of assignments of names to expressions, so that the names can be use to refer to the expression. 
The following expression can get names:
cell ranges.
Other expressions, for example, parts of a formula.
<define name="table-named-expressions">
	<element name="table:named-expressions">
		<zeroOrMore>
			<choice>
				<ref name="table-named-range"/>
				<ref name="table-named-expression"/>
			</choice>
		</zeroOrMore>
	</element>
</define>

Named Range
The named range element <table:named-range> specifies a cell range that has a name assigned. For information on defining a cell range, see section 8.3.1. 
The table:name attribute specifies the name of the range, and the table:cell-range-address attribute its address. The address can be either absolute or relative. If the cell range address is relative, the table:base-cell-address attribute must exist additionally. It specifies the base cell address for the cell range. This address must be absolute. Therefore a table name in the address is required, but the dollar signs that indicate an absolute address can be omitted.
An additional table:range-usable-as attribute specifies whether the name of the range can be used within the specification of a print range, a filter, a repeating row, or a repeat column. The value of this attribute can be either: 
none, or
a space-separated list that consists of any of the values print-range, filter, repeat-row or repeat-column.

<define name="table-named-range">
	<element name="table:named-range">
		<ref name="table-named-range-attlist"/>
		<empty/>
	</element>
</define>

<define name="table-named-range-attlist" combine="interleave">
	<attribute name="table:name">
		<ref name="string"/>
	</attribute>
	<attribute name="table:cell-range-address">
		<ref name="cellRangeAddress"/>
	</attribute>
	<optional>
		<attribute name="table:base-cell-address">
			<ref name="cellAddress"/>
		</attribute>
	</optional>
	<optional>
		<attribute name="table:range-usable-as" a:defaultValue="none">
			<choice>
				<value>none</value>
				<list>
					<oneOrMore>
						<choice>
							<value>print-range</value>
							<value>filter</value>
							<value>repeat-row</value>
							<value>repeat-column</value>
						</choice>
					</oneOrMore>
				</list>
			</choice>
		</attribute>
	</optional>
</define>


Named Expression
The named expression element <table:named-expression> contains an expression with a name, for example, a part of a formula. 
The table:name attribute specifies the name of the expression, and the table:expression attribute the expression itself. The expressions do not support the equal (=) sign as the first character. If the expression contains a named range or another named expression, the named range or named expression must be specified first, before the containing expression. If the expression contains a relative cell range address, the table:base-cell-address attribute must exist additionally. It specifies the base cell address for the cell range. This address must be absolute. Therefore a table name in the address is required, but the dollar signs that indicate an absolute address can be omitted.
<define name="table-named-expression">
	<element name="table:named-expression">
		<ref name="table-named-expression-attlist"/>
		<empty/>
	</element>
</define>

<define name="table-named-expression-attlist" combine="interleave">
	<attribute name="table:name">
		<ref name="string"/>
	</attribute>
	<attribute name="table:expression">
		<ref name="string"/>
	</attribute>
	<optional>
		<attribute name="table:base-cell-address">
			<ref name="cellAddress"/>
		</attribute>
	</optional>
</define>

Example: Named expressions element with a named range and a named expression

<table:named-expressions>
	<table:named-range table:name="sample1" table:cell-range-address=".C4"
		table:base-cell-address="sampletable.F1" table:area-type="none"/>
	<table:named-range table:name="sample2" 
		table:cell-range-address=".$D$3:.$K$8" 
		table:area-type="print-range filter"/>
	<table:named-expression table:name="sample3" 
		table:expression="sum([.A1:.B3])"/>
</table:named-expressions>



(And here's how table:table is defined:)

	<element name="table:table">
		<ref name="table-table-attlist"/>
		<optional>
			<ref name="table-table-source"/>
		</optional>
		<optional>
			<ref name="office-dde-source"/>
		</optional>
		<optional>
			<ref name="table-scenario"/>
		</optional>
		<optional>
			<ref name="office-forms"/>
		</optional>
		<optional>
			<ref name="table-shapes"/>
		</optional>
		<ref name="table-columns-and-groups"/>
		<ref name="table-rows-and-groups"/>
	</element>