search-ws message

Subject: Re: [search-ws-comment] SRU 2.0 Draft Feedback
From: "Hammond, Tony" <t.hammond@nature.com>
To: "Ray Denenberg, Library of Congress" <rden@loc.gov>,<search-ws-comment@lists.oasis-open.org>,<search-ws@lists.oasis-open.org>,<ZNG@sun8.LOC.GOV>
Date: Thu, 20 Aug 2009 14:27:37 +0100

Thanks for the feedback, Ray.

You've clarified for me the "short name" problem I was running into. Also,
you remarked on my observation about the lack of ID available when no result
set is returned. That is fine, though it does still place a burden on
implementations that must manage/generate IDs. (E.g. Atom has an "id"
element for the feed, and RSS 1.0 has an "channel/@rdf:about" attribute for
the feed.)

I wanted though to pick up on your comments about serializations:

> By "native XML" I assume you refer to the default schema (which is being
> registered as mime type application/sru+xml).
> So the various ways you mention -  SRU, ATOM, RSS, JSON - the first three
> are all XML, all supported.  I don't think we've ever talked about  a
> non-xml serialization (e.g. JSON), but I think that would be a different
> binding.

I really don't follow how RSS and ATOM are currently supported. The fact
that they are XML formats is a point they have in common with the SRU
schema. But I don't see anything beyond that. They are not amenable e.g. to
XSD validation. Instead they are simple XML host formats which can carry
chunks of SRU XML.

As for non_XML serializations (e.g. JSON), I have proposed this earlier and
already shared a blog posting [1] I made about it. Seems to me that a
natural expression for JSON would be a direct mapping of ATOM (which is
perhaps better suited to this purpose than RSS, especially having an RFC
pegged against it).

And JSON again is (or could be) a carrier format for SRU and is also not
amenable to XSD validation.

As such, I really don't see much difference between RSS, ATOM and JSON as
host formats for SRU response elements. I can see that there might be a
stable of XML formats (SRU, RSS, JSON) but don't really see how that relates
to any single binding as there are differences between them regarding
purpose and validation capability. And JSON is obviously in another camp.

Seems to me that RSS, ATOM and JSON all have more in common than SRU, RSS
and ATOM. They are all general purpose formats which can carry arbitrary
data models.

So I again still don't see why the SRU binding needs to specify XML response
elements rather than abstract elements. But maybe that is simpler to define
a single concrete syntax for SRU, and then to define mappings of that
(extension formats?) to other serializations.

Cheers,

Tony

[1] 
http://www.crossref.org/CrossTech/2009/07/opensearch_formats_for_review.html



On 14/8/09 19:45, "Ray Denenberg, Library of Congress" <rden@loc.gov> wrote:

> Tony, thanks much for the thorough review and comments.   Since I'm about to
> depart for a week I went through these quickly and joted down some
> preliminary responses.
> 
> 
>> 1. Parameters / Elements
>> 
>> I think it would help to break out Request Parameters from Response
>> Elements
>> in Sects. 4, 6 and 7. The two sets are largely disjoint. (Same remark
>> applies to the Abstract Protocol Definition.)
> 
> This is a reasonable suggestion but I am reluctant to do it, because the
> parameter and element descriptions are written in a more narrative than
> formal style and I think it is more descriptive this way. And for each
> parameter or element listed in the tables there is a link to the section
> where it is introduced, so the reader can read about it in greater context
> than if the formal approach had been taken.  I'd welcome other views on this
> though.
> 
> 
>> Also might help to break out discussion of Facets into separate section
>> (as
>> Diagnostics and Extensions), especially since Facets is an optional
>> feature.
> 
> I think it would be a good idea to allocate a separate section for facets -
> not because they are optional (most everything is) but because it probably
> deserves a separate section.
> 
> 
>> I also think that Search Result Analysis is sufficiently specialized to
>> warrant its own section.
> 
> Ok.
> 
> 
>> Sect. 5 seems to be misplaced coming as it does in between Sect. 4 and
>> Sects. 6 and 7.
> 
> Ok.
> 
> 
>> 2. Parameter / Element Ordering
>> 
>> Not clear what the basis is for the orderings given in Sect. 2 (Table 1)
>> and
>> 3 (Table 6). Is it a logical ordering?
> 
> Yes, a logical ordering.  The order in which they are described in section
> 4.
> 
> 
>> I note that "echoedSearchRetrieveRequest" is differently located (at
>> bottom)
>> from its location in the 1.* XSD schema. Is that intentional?
> 
> No,  no reason for that. Will re-evaluate ordering next draft.
> 
> 
> 
>> 3. Response Elements
>> 
>> Response elements are given for a specific serialization - XML.
>> 
>> Is that what is intended by binding? I would have thought binding would be
>> to a specific data model (e.g. SRU) which can then be serialized various
>> ways: native XML, ATOM, RSS, JSON, etc.
> 
> By "native XML" I assume you refer to the default schema (which is being
> registered as mime type application/sru+xml).
> So the various ways you mention -  SRU, ATOM, RSS, JSON - the first three
> are all XML, all supported.  I don't think we've ever talked about  a
> non-xml serialization (e.g. JSON), but I think that would be a different
> binding.
> 
> 
> 
>> Also, heading in Sect. 3.1 is to "Actual Reponse Elements ..." and should
>> be
>> "Reponse Elements ..." only. (Cf  Sect. 2.1 which is to "Request
>> Parameters
>> ..." alone.)
> 
> That should all be fixed when we put back the parts we took out.
> 
> 
>> 
>> 4. Response Elements: "resultSetIdentifier", "timeToLive", "idleTime"
>> 
>> Mentioned in Sect 2, 3 and 4.10 the element "resultSetIdentifier" should
>> be
>> "resultSetId" everywhere.
> 
> Yes.
> 
> 
>> And in Sect 3, "timeToLive" and "idleTime" should be "resultSetTTL" and
>> "resultSetIdleTime", respectively.
> 
> Ok.
> 
> 
>> 5. Response Elements: "diagnostics"
>> 
>> Probably don't need the "(non-surrogate)" in the name value field. Perhaps
>> this could be footnoted in the table?
> 
> Ok.
> 
>> 6. Request Parameters: "httpAccept-*"
>> 
>> I think these params are incorrectly named and should follow the standard
>> camelcase style used elsewhere. e.g.
>> 
>> Accept-Charset:             httpAccept-charset -> httpAcceptCharset
>> Accept-Encoding:             httpAccept-encoding -> httpAcceptEncoding
>> Accept-Language:             httpAccept-language -> httpAcceptLanguage
>> Accept-Ranges:             httpAccept-ranges -> httpAcceptRanges
>> 
>> Even though they mimic the HTTP headers they break naming convention.
> 
> Ok.
> 
> 
>> 
>> 
>> 7. Request Parameters: "rendering"
>> 
>> This is just a query. I wonder if the terms "client" / "server" would be
>> more appropriate than "local" / "remote". It might be more correct to talk
>> about "local" / "remote" but I always end up having to do a double take to
>> figure out my relative position.
> 
> I would be fine with changing "local" / "remote"  to "client" / "server" but
> I would like to hear others' opinions.  We spend quite awhile talking about
> this; in the end I don't think it matters, but others might.
> 
> 
>> 8. Request Parameters: recordSchema, sortKeys/sortSchema
>> 
>> Both "recordSchema" and "SortKeys/sortSchema" allow for short names to be
>> used in place of URIs. But the SRU registered short names [1] are not
>> unique. E.g. "mods" is mapped to four different XML schema (3.0, 3.1, 3.2,
>> 3.3) and likewise "pam" is ampped to two different XML schema (2.0, 2.1).
> 
> No, MODS is not mapped to any schema.  True, it's listed for several
> schemas, but as a preferred short name.  That only means that the owner of
> the schema recommends that short name,  but you still have to go to the
> explain file to find out what the short name is for a specific schema.  In
> the case where a server supports multiple version of, say, MODS, then it
> must use different short names, so it is not true that the preferred short
> name listed in the schema table must be used for a given schema.
> 
> 
>> Also in Sect. 4.7.1 it says under sortSchema "the URI for an XML schema".
>> What is meant though is the "short name" for an XML schema, which is a
>> placeholder for the URI. And that is shown in the examples but needs
>> better
>> explanation.
> 
> Ok.
> 
> 
>> Still, the short name to URI mapping problem remains.
>> 
>> [1] http://www.loc.gov/standards/sru/resources/schemas.html
> 
> There is no short name to URI mapping, that is there is no global mapping,
> only local mappings for a given server, and the mapping is in explain.  We
> do need a better explanation of this  within the schema page.
> 
> 
>> 9. Response ID
>> 
>> There is no ID returned in the response. If there are records then a
>> "resultSetId" is returned but not otherwise. Some serializations (e.g.
>> ATOM)
>> require an ID (actually a URI) for the response. One strategy would be to
>> use the "resulSetId" as the basis for a unique ID, but this fails when no
>> records are returned and a response is still required to carry the
>> diagnostics.
>> 
>> Is there some other place to return a unique response ID?
> 
> Is there a general use case for an id for the response, or is the concern
> only that a response id is required for certain serializations, ATOM, for
> example?  If it's the latter then I can't see that this is a problem. When
> ATOM is the response format it will contain an id (in the ATOM namespace)
> and the protocol does not need to deal with it.  Right?
> 
> 
> 
>> 10. Endpoints
>> 
>> I would have preferred to see the "operation" parameter maintained so that
>> "searchRetrieve" and "scan" could both be located on the same endpoint,
>> and
>> an explicit choice be made between them. Heuristics could be applied but I
>> think this is an unnecessary shorthand and may only lead to problems down
>> the line. I would have made this parameter optional at the least.
>> 
>> As regards version agree that it could be dispensed with although don;t
>> see
>> any real harm in allowing for an optional parameter. Definitely not a
>> required parameter.
> 
> I'm still hoping we can keep these two parameters out of the protocol but I
> agree that this merits further discussion.
> 
> 
>> 
>> 11. Typos, etc
> We'll correct all these.
> 
> Thanks.
> 
> --Ray
> 


********************************************************************************   
DISCLAIMER: This e-mail is confidential and should not be used by anyone who is
not the original intended recipient. If you have received this e-mail in error
please inform the sender and delete it from your mailbox or any other storage
mechanism. Neither Macmillan Publishers Limited nor any of its agents accept
liability for any statements made which are clearly the sender's own and not
expressly made on behalf of Macmillan Publishers Limited or one of its agents.
Please note that neither Macmillan Publishers Limited nor any of its agents
accept any responsibility for viruses that may be contained in this e-mail or
its attachments and it is your responsibility to scan the e-mail and 
attachments (if any). No contracts may be concluded on behalf of Macmillan 
Publishers Limited or its agents by means of e-mail communication. Macmillan 
Publishers Limited Registered in England and Wales with registered number 785998 
Registered Office Brunel Road, Houndmills, Basingstoke RG21 6XS   
********************************************************************************
References:
- Re: [search-ws-comment] SRU 2.0 Draft Feedback
  - From: "Ray Denenberg, Library of Congress" <rden@loc.gov>