OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

regrep message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [regrep] Proposal for CQL Profile and Image Profile


Farrukh:
Classes of operations we support when accessing remote files include:
  • Posix I/O (open, close, read, write, seek, stat, ….)
  • HDF5 libraries for subsetting files and accessing metadata
  • NetCDF libraries for subsetting files and accessing metadata
  • Open Geospatial Consortium protocol
These libraries are applied at the storage location where the file resides.  The goal is to minimize the amount of data sent over a network.  We have user communities that manipulate files that are up to 2.5 Terabytes in size.

This implies there are a wide variety of operations that may need to be applied when accessing data.  Will there be a way to translate from your protocol specification for operations to  the above examples?

Reagan Moore

From: Farrukh Najmi <farrukh@wellfleetsoftware.com>
Date: Saturday, August 11, 2012 2:00 PM
To: Reagan Moore <rwmoore@renci.org>
Cc: ebXML Regrep <regrep@lists.oasis-open.org>
Subject: Re: [regrep] Proposal for CQL Profile and Image Profile

Reagan,

Thank you for getting the technical discussion started on the proposed new extension specs. I am copying the list on my response.

On 08/10/2012 04:31 PM, Reagan Moore wrote:
I have a full day faculty meeting on August 17.

In that case I suggest we defer our meeting for another two weeks and meet on Friday August 31, 2012 at 12pm ET and discuss the proposal over email in the interim as you have done below. Does that work you and other colleagues?



We find the need for multiple types of queries when dealing with scientific data:
  • spatial queries, find an item within a bounding box

This is planned to be supported. See the following query example in the ImageProfile wiki page:

  • Find images by GeoLocation:

    exif.geoLocation WITHIN POLYGON((59 22, 78 22,78 38, 59 38, 59 22))
The spec will support all spatial relations (not just WITHIN) and all geometry types not just POLYGON as defined by Egenhofer Spatial Relations.

  • Temporal queries, find an event within a time period

This is planned to be supported. See the following query example in the ImageProfile wiki page:

  • Find by creation date:

    exif.dateTime >= "2008-07-13T21:05:34"

  • Logical queries ("and" and "or")
Will the approach be extensible to allow operations beyond =, <, > ?

CQL defines a decent core set of relations as part of the CQL context and allows any new relations to be defined by additional contexts so the approach is completely extensible. Also, we will define the use of regrep query functions within the CQL query like any other query so this would be above and beyond the extensibility offered by CQL itself.

I will update the wiki pages sometime to make sure above responses are clearly called out.

So what do you and other colleagues think about these two profiles so far?


Reagan

From: Farrukh Najmi <farrukh@wellfleetsoftware.com>
Date: Friday, August 10, 2012 1:04 PM
To: ebXML Regrep <regrep@lists.oasis-open.org>
Subject: [regrep] Proposal for CQL Profile and Image Profile


Dear Colleagues,

I have been working on supporting the publish, management and discovery of image content (such as files with format of JPEG, PNG, GIF etc.) in an ebXML RegRep.
I would like to discuss ideas and initial thoughts on a extension profile specification with the proposed title "ebXML RegRep Profile for Image Resources".

During the course of this work I found that unlike other profiles, the Image Profile requires a much larger number of metadata attributes by which to allow searching for image resources. These metadata attributes are defined by specifications such as [EXIF-2.2] and [IPTC-2008]. I also found that many of these attributes are numeric and search needs to support finding matches to numeric attributes such as imageWidth and imageLength to be within a certain min/max range. These unique differences made it rather unwieldy to define a canonical parameterized query to discover image resources by. There would be just too many parameters in the traditional approach.

The existing canonical Adhoc Query would be the solution to this problem. However, RegRep core does not define a standard query language syntax that could allow using the Adhoc Query in an interoperable manner across registry implementations. One registry could support SQL query while another could support XQUery and the query schema could be quite different even for the same query language.

This led me to consider the CQL [SearchRetrievePt5] as a implementation neutral query language syntax for querying image profile data as well as any other profile's data and core ebrim RegistryObject metadata. The basic idea is to define a CQL context set for each such profile. The context set defines a set of indexes that can be used in searching data for that context. For example, in Image Profile you can have exif.imageWidth and exif.imageHeight indexes that allow searching for images by their pixels height and width. The index definition would also specify how the index relates to RegistryObjects for that type of data (e.g. images). Information would include data type for each index as well as a set of relations that could be used (e.g. "=", "<", ">" etc...).

Here are some examples for CQL queries that could be used to search for image content. A query could contain any number of predicates combined using boolean operators like AND, OR.
  • Find by creation date: exif.dateTime >= "2008-07-13T21:05:34"
  • Find images where width and height are both >= 300 pixels: exif.imageWidth >= 300 AND exif.imageHeight >= 300
  • Find by f-number: exif.fNumber >= 2.8
  • Find images by GeoLocation: exif.geoLocation WITHIN POLYGON((59 22, 78 22,78 38, 59 38, 59 22))
  • Find by creator: iptc.creator.name = "*farrukh*najmi*"
  • Find by genre: iptc.intellectualGenre = "wildlife"
The use of CQL queries can be specified using the existing RegRep Adhoc Query protocol with a new CQL query language. All this is proposed to be defined a extension profile specification with the proposed title "ebXML RegRep Profile for Contextual Query Language (CQL)". No protocol changes or changes to RegRep core would be needed to support CQL as an extension profile.

The outline for both proposed extension specs are available in our wiki here for your review:
I would like to propose that we meet on our next TC meeting on August 17, 2012 at 12PM ET to discuss these two proposals.

Please let me know (off list) if there is any chance that you will be unable to attend and please share your thoughts on this email thread until our meeting. Thank you.

-- 
Regards,
Farrukh Najmi

Web: http://www.wellfleetsoftware.com


-- 
Regards,
Farrukh Najmi

Web: http://www.wellfleetsoftware.com


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]