office message

Subject: Re: [office] data cache, field name and source field id fordata pilot table
From: Kohei Yoshida <kyoshida@novell.com>
To: Ming Fei Jia <jiamingf@cn.ibm.com>
Date: Fri, 12 Dec 2008 10:38:15 -0500
Hi Ming,

On Tue, 2008-12-02 at 13:18 +0800, Ming Fei Jia wrote:

> Kohei Yoshida <kyoshida@novell.com> wrote on 11/25/2008 06:30:28 AM:
> 
> > Re: [office] data cache, field name and source field id for data
> pilot table
> > 
> > On Sun, 2008-11-23 at 23:52 +0800, Ming Fei Jia wrote:
> > > Dear TC members,
> > > 
> > > I have created the proposal on the wiki:
> > > http://wiki.oasis-open.org/office/data_cache%
> > > 2C_field_name_and_source_field_id_for_data_pilot_tables
> > > Thanks.
> > 
> > I have two comments regarding this proposal.
> > 
> > 1) An additional attribute to allow an alternative name to a data
> pilot
> > field is a great idea.  I think we should extend that to field
> members
> > as well, to allow alternative names to individual field members
> since
> > that is also something that the users desire to do.
> Besides the more human readable, I propose the field name to be
> allowed to rename is mainly from the consideration that in field
> reference operation, allow different data view for the same source
> field. For example, for the same source field "count", which records
> the product count of each person in one sales report, we can have 2
> fields "absolute count" and "relative count" that bind to the same
> source field "count". And "absolute count" is the normal value view of
> the source field, the "relative count" is the reference view of the
> count relative to some specific person's count.
> 
> As I understand, field member name specifies the value of data pilot
> member,currently using the attribute <table:member-name> to represent.
> What I can see the benefit of allowing member name to rename is the
> more human readable. For example, for a field "Country", which has 2
> values: "China" and "USA". The "China" and "USA" will be 2 field
> members. In order to be more human readable for Chinese people, we can
> allow users to rename the member name to the corresponding Chinese
> string "中国".

>  Of course, this rename shall be unique at least in the current data
> pilot table scope, otherwise, will cause confusing. 

Agreed.  We will probably need to working on including some sort of
unique name requirement in the specification.

> But I think only human readable the requirement seems not strong
> enough, could you have any function requirement for renaming member
> name? That will be better.

How about interoperability with Excel?  Excel supports this at least for
the past few versions (and probably more versions before that), and we
frequently receive customer documents making use of this feature.  Not
supporting this in ODF means that we'll lose that information once such
document is saved in ODF.  To me that is a strong case in favor of
supporting this enhancement in ODF.

Additionally, I can think of a case where the user of data pilot does
not have the ability to change member names because he/she does not have
access to the source data (e.g. external database or cached data).  Even
if the user has access to the source data, changing the name of one
member may require modifying a large number of cells if that member
occurs frequently in the associated field.  Providing a quick way to
temporarily rename a member name in one location should be a worthwhile
convenience.

BTW, this proposal of mine:
http://wiki.oasis-open.org/office/display_names_in_data_pilot

which proposes a super-set of the field display name part of your
proposal (and includes other attributes such as grand-total-name and
data-pilot-subtotal-name), was originally inspired by the
interoperability requirement, and later reinforced by user requests.  To
me, that is a strong enough requirement to make this enhancement
worthwhile for inclusion in the ODF standard.

> 
> > 
> > For that reason, how about changing the name of the attribute from
> > "table:field-name" to "table:display-name"?  That way the same
> attribute
> > name can be used for the <table:data-pilot-member> element as well.
>  Or
> > any other name that can be used both for <table:data-pilot-field>
> and
> > <table:data-pilot-member> would do.
> Yes, the table:display-name is more appropriate for this case. When I
> originally propose, I just used the table:display-name. But then I saw
> "table:field-name" is already used in the element
> <table:data-pilot-field-reference>, which has the same meaning with
> the one in this case. So I use "table:field-name" as the attribute
> instead of inventing other new attribute name. Of course, I also
> prefer the table:display-name if no one disagree.

To me the name table:display-name is more indicative of the purpose of
this attribute, since its value is used for display purposes only.  The
name field-name, on the other hand, sounds a little ambiguous for what
the attribute is used for.

(I'm aware that later you changed this to field-display-name, which IMO
is better in terms of explicit naming.  But I still prefer a more
neutral display-name for the reason I outline below.)

And when other elements need an alternative name for display purposes,
we could re-use this name without conditionalizing its semantics based
on the parent element.  As my proposal indicates, the data-pilot-member
element is one such element that could use this attribute.

Having said this, I'm open to alternative suggestions.

> > 
> > 2) Regarding the assignment of unique IDs to each field in the data
> > source, you propose to use the sheet name plus the column label as
> the
> > unique ID.  Why not simply use 0-based numerical IDs?  When the data
> > source is loaded, I can imagine internally the data are structured
> in a
> > single tabular form anyway.  So, I would imagine using the column
> (or
> > field) indices of that internal table would make the implementation
> a
> > little simpler.  Doing that would also allow it to be used when the
> data
> > source is not on a local spreadsheet but in an external data source,
> > and/or the data source is cached (as in your proposal).
> Sure, I just take an example for the ID, not definately a sheet name
> plus column label. What I propose is only a unique ID that can
> represent the location of the source field in the data source. As to
> how this ID is comprised of, I originally do not care much. But now as
> you said, I think using an index value of the source table as the
> source field id is better. I've changed the source field id definition
> in the wiki(http://wiki.oasis-open.org/office/data_cache%
> 2C_field_name_and_source_field_id_for_data_pilot_tables). I checked MS
> Excel, which uses an index from 1, so I define this ID as
> postiveInteger in oder to keep good interoperability with Excel.

I personally don't see any strong case favoring either 0-based or
1-based.  So, your suggestion sounds reasonable to me.  I just
personally prefer 0-based numbering for everything unless there is a
specific reason to pick 1-based numbering.  But that's just my personal
taste. ;-)

But like I said, 1-based numbering is fine with me (although it would be
nice to know why Excel chooses 1-based numbering here).

Best regards,

Kohei
-- 
Kohei Yoshida - OpenOffice.org Engineer - Novell, Inc.
<kyoshida@novell.com>
Follow-Ups:
- Re: [office] data cache, field name and source field id for data pilottable
  - From: Ming Fei Jia <jiamingf@cn.ibm.com>
References:
- Re: [office] data cache, field name and source field id for data pilottable
  - From: Ming Fei Jia <jiamingf@cn.ibm.com>