office message

Subject: Re: [office] data cache, field name and source field idfor data pilot table
From: Kohei Yoshida <kyoshida@novell.com>
To: Ming Fei Jia <jiamingf@cn.ibm.com>
Date: Mon, 15 Dec 2008 16:32:09 -0500
On Tue, 2008-12-16 at 00:22 +0800, Ming Fei Jia wrote:

> I saw your proposal. A few comments here: 
> (1)table:grand-total is enumeration type, and can choose
> none,row,column or both. So if you want to define display names for
> grand total, at least define 2 names: one is for row, the other is for
> column. right?

Well, for interoperability point of view, having one custom name for
the grand total would be sufficient.  In Excel, for instance, the custom
grand total name is shared between the column and row grand total names,
and its file format stores only one name for both types.

OTOH, if we decide that defining different display names for column and
row grand total names is useful, we could do that.  Having this in mind,
I would like to propose the following alternative.

Instead of storing the grand-total-name as an attribute of the
<table:data-pilot-table> element, define a new element
<table:data-pilot-grand-total> which appears below the
<table:data-pilot-table> element and can have the following attributes:

* table:display-name - to store the display name
* table:grand-total - to specify its position, which can be either, row,
column, or both.
* table:enabled - to specify whether or not this grand total is enabled
(or maybe this attribute can be omitted if we use the presence or
absence of this element as an indication of whether or not it is
enabled)

This way, the <table:data-pilot-table> element can have either one
<table:data-pilot-grand-total> element for both totals, or two of them
for column and row totals individually.

And because this is a new optional element, it doesn't affect backward
compatibility, and we can even deprecate the table:grand-total attribute
as an attribute of <table:data-pilot-table> in future versions, perhaps
in 2.0.

How does this sound?

> (2)table:data-pilot-sub-total is already defined as an element. Why
> not just add an attribute table:display-name to the element itself?

We could certainly do that, and that indeed seems like a better location
to store this data.

However, there is one issue to consider.  We need to find a way to
encode the position(s) of the field member name in the display name
string, so that the data pilot table can insert the corresponding field
member name in the subtotal text.

I will give you an example.  Let's say a field consists of the following
members:

* Andy
* Bruce
* Charles

and the subtotal is displayed for each of these members as follows:

Andy Subtotal
..
Bruce Subtotal
..
Charles Subtotal

Then, the user wants to change these strings to

Score Total (Andy)
..
Score Total (Bruce)
..
Score Total (Charles)

In this case, the custom subtotal string for this field would consist of
the following three parts

"Score Total (" + <member name> + ")"

There are two ways to encode this custom subtotal name.  One way is to
use a special character, say, '?' to represent the member name, and use
a single string value.  In this scheme, the above name would be encoded

"Score Total (?)"

This is in fact what Excel does.  If we use this scheme, then we could
simply re-use the table:display-name attribute to store it as an
attribute of <table:data-pilot-subtotal> as you suggest.

Another way to store this is what's in my proposal, which is to
introduce a new element that can have a combination of text and the
field member marker.  This marker is equivalent of "?" in the first
example.  In this scheme, the example subtotal name is stored as

<table:data-pilot-subtotal-name>
  Score Total (<table:data-pilot-field-member-marker/>)
</table:data-pilot-subtotal-name>

where the <table:data-pilot-field-member-marker/> element gets replaced
with the corresponding member name.  The advantage of this over the
first scheme is that, if the users want to use '?' literally, it can
allow it, while in the first scheme, they can't.

So, this is the rationale behind my proposal for the design of the
custom subtotal name.

(There is one interoperability note.  Excel seems to allow custom
subtotal name only when the subtotal mode is 'auto' (which corresponds
with the table:function attribute having a value of 'auto'.  While I
don't think this should affect the semantics of the proposed custom
subtotal name in the ODF spec, it is perhaps worth noting for the
implementers.)

> (3)I do not very understand the table:data-pilot-field-member-marker
> you proposed.

Hopefully my explanation above clarifies it.  I am, however, still
concerned about the length of the proposed marker element
(<table:data-pilot-field-member-marker/>), which could be replaced with
something else that is shorter.  Something more generic like
<office:place-holder/> (or similar) that can change its semantics
depending on its parent element may be a way.  Anyway, I'm open to other
suggestions in this area.

> Additionally, I find <table:data-field> in
> <table:data-pilot-display-info> and <table:data-pilot-sort-info> may
> also need a display name. Currently <table:data-field> only specifies
> the source data field name. But here <table:data-field> is an
> attribute name, and can not contain attribute. So maybe add a new
> attribute <table:data-field-name> to the element
> <table:data-pilot-display-info> and <table:data-pilot-sort-info>. Or
> alternative solutions? 

Hmm.  I think we can just say in the text that the table:data-field
attribute always use the internal field name, as opposed to its display
name counterpart in case one exists.  Then (if necessary) the display
name of the referenced field can be looked up from its internal field
name.

If there are other places that references a field by its name, we can
put the same requirement in those places as well.  After all, the
display name is for display purposes only, and for all the other
purposes, the real name should be used. ;-)

> Maybe your proposal need to include the 2 places so that we can
> provide a relative complete solution.
> 
> > 
> > To me the name table:display-name is more indicative of the purpose
> of
> > this attribute, since its value is used for display purposes only.
>  The
> > name field-name, on the other hand, sounds a little ambiguous for
> what
> > the attribute is used for.
> > 
> > (I'm aware that later you changed this to field-display-name, which
> IMO
> > is better in terms of explicit naming.  But I still prefer a more
> > neutral display-name for the reason I outline below.)
> > 
> > And when other elements need an alternative name for display
> purposes,
> > we could re-use this name without conditionalizing its semantics
> based
> > on the parent element.  As my proposal indicates, the
> data-pilot-member
> > element is one such element that could use this attribute.
> Make sense. A similar case is <draw:display-name>, totally 9 elements
> re-use the same <draw:display-name>, pls refer to 18.233 in
> OpenDocument-v1.2-draft7-11.odt. Of course, this re-use has its
> condition that the display name attribute is just for the element that
> contains it. Otherwise, we have to specify the explict object name.
> For example, <table:display-name> is not appropriate for
> <table:data-field> in the element <table:data-pilot-sort-info>.
> 
> > Having said this, I'm open to alternative suggestions.
> > 
> > > > 
> > > > 2) Regarding the assignment of unique IDs to each field in the
> data
> > > > source, you propose to use the sheet name plus the column label
> as
> > > the
> > > > unique ID.  Why not simply use 0-based numerical IDs?  When the
> data
> > > > source is loaded, I can imagine internally the data are
> structured
> > > in a
> > > > single tabular form anyway.  So, I would imagine using the
> column
> > > (or
> > > > field) indices of that internal table would make the
> implementation
> > > a
> > > > little simpler.  Doing that would also allow it to be used when
> the
> > > data
> > > > source is not on a local spreadsheet but in an external data
> source,
> > > > and/or the data source is cached (as in your proposal).
> > > Sure, I just take an example for the ID, not definately a sheet
> name
> > > plus column label. What I propose is only a unique ID that can
> > > represent the location of the source field in the data source. As
> to
> > > how this ID is comprised of, I originally do not care much. But
> now as
> > > you said, I think using an index value of the source table as the
> > > source field id is better. I've changed the source field id
> definition
> > > in the wiki(http://wiki.oasis-open.org/office/data_cache%
> > > 2C_field_name_and_source_field_id_for_data_pilot_tables). I
> checked MS
> > > Excel, which uses an index from 1, so I define this ID as
> > > postiveInteger in oder to keep good interoperability with Excel.
> > 
> > I personally don't see any strong case favoring either 0-based or
> > 1-based.  So, your suggestion sounds reasonable to me.  I just
> > personally prefer 0-based numbering for everything unless there is a
> > specific reason to pick 1-based numbering.  But that's just my
> personal
> > taste. ;-)
> > 
> > But like I said, 1-based numbering is fine with me (although it
> would be
> > nice to know why Excel chooses 1-based numbering here).
> MS Excel just uses 1-based IDs. So interoperability makes me incline
> to 1 instead of 0, :-)
> As to why Excel uses 1, maybe Doug or Eric can answer. But seems no
> explicit meaning here, maybe only a taste.

If I'm not mistaken, there is one place in Excel file format that uses
1-based sheet index, where the index of 0 is used to store global
settings (or something like that).  I wonder if the reason for 1-based
column index is for something similar...

Best Regards,

Kohei

-- 
Kohei Yoshida - OpenOffice.org Engineer - Novell, Inc.
<kyoshida@novell.com>
References:
- Re: [office] data cache, field name and source field id for data pilottable
  - From: Ming Fei Jia <jiamingf@cn.ibm.com>
- Re: [office] data cache, field name and source field id fordata pilot table
  - From: Kohei Yoshida <kyoshida@novell.com>
- Re: [office] data cache, field name and source field id for data pilottable
  - From: Ming Fei Jia <jiamingf@cn.ibm.com>
- Re: [office] data cache, field name and source field id for data pilottable
  - From: Ming Fei Jia <jiamingf@cn.ibm.com>