OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

search-ws-comment message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]

Subject: RE: [search-ws-comment] Suggestions for SRU 2.0 specification document

Title: Re: Facet ranges
Hi all,
Thank you for thinking with us on the facets. Last week there have been some posts on this topic and Ralph and me had a telephone conversation that clarified some things as well. I'll give a summary with my ideas after the >>>:
Ray rightly asked about the server-defined vs. the client-defined facets:

"Edo - Before we carry this discussion further, can you clarify, which of these two models are we talking about in your case?  "

>> I believe this where my original confusion stems from. As I have understood there is a mechanism to define * one facet term only * by the client (through facetLowValue and facetHighValue).  So a client might define "Last week" as having a datePublished with a facetLowValue of for example September 8 and a facetHighValue of September 1. If the client also wanted to have a facet of "Last month" *at the same time* then that would not be possible. 


So, we will go for the alternative model that Ray suggested:

"One can envision a different model, where the server groups  publication dates by months, so there is a facet for 'August2010', 'July2010', etc.  (Perhaps it is still one big index, but the server is exposing different facets by month. And listing via Explain.)  In this case if the client wants "August 1, 2010 through August 31, 2010" then it specifies the August2010 facet  - but of course if it want July 15 through August 15 it is out of luck; the ranges are pre-assigned by the server.   "


In other words, we'll use the model that Edward Zimmerman proposed:

"The publishedLastWeek etc. are nothing more than date range attributes using a special local value. The server would interpret what this means.. The client, by contrast, just sees the value.. and can use it.. In that sense it nothing really different from a search restriction to those records whose value of some field contains a term..
We have, for instance, bib.date.published=2010 we can also have bib.date.published=lastWeek
Its up to your server to understand what the date "lastWeek" (or "Last Week" or "This Week" or .. .) means and do the right thing.. just as it needs to go the right thing when it see a date encoded in an ISO 8601 format.. [as a side note: in my date parser I did implement a set of names for dates and ranges for reasons of utility]
From the view of facets there is no difference."


So, to paraphrase Edward and to avoid any confusion I will list the mechanism that we will utilize in more or less plain English:

1. Client says: search for X and give me facets for datePublished. The client may not be aware that datePublished is a range facet.

2. Server thinks: datePublished is a range facet that is defined by me.

3. Server returns results for X and facet terms "Last week (10)"; "Last month (20)" and "This year (200)".

4. User clicks "Last week".

5. Client says: search for X AND datePublished="Last week".

6. Server thinks: "Last week" is not a date, but a special value that means "datePublished>today-7 AND datePublished=<today". Responds with the results.

7. Client thinks "thanks" and is still not aware that datePublished is a range facet...

Ok, so far for the implementation that we will choose. Let me continue with my personal opinion (and do remember that I am new to this, so please feel free to disagree or to correct me if I am wrong). To me it does not make a lot of sense to have a client-defined specification for range facets, if you can define *one range only*. It is quite limited in its use: if you want to search for an object with a datePublished of "last week" it is perfectly possible by using CQL only (the only difference being that you would not see any results *outside* this range if you define it that way). Alternatively, defining *multiple* ranges for range facets client-side is very clunky (how do you know what facetLowValue goes with what facetHighValue if you have to define it in a URL? It's possible, but clunky.). So in my opinion the SRU standard would be made simpler (and easier to understand by simple folk like me ;-) ) if it would not have a client-defined range facet definition at all. In my opinion this is not usually such a big problem from a functional point of view: usually the owner of a collection knows quite well if it makes sense to group facets by "last week" & "last month" and not by "First quarter" and "Second quarter", for example. In the case you all agree, I feel Edward's model should be described in the SRU 2.0 specs.
I'm curious to hear what you think.

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]