OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

regrep message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [regrep] Proposal for CQL Profile and Image Profile



Please see inline below...

On 08/13/2012 12:34 PM, Reagan Moore wrote:
Yes.  The set of operations that are performed on the remote object will be much more sophisticated than simply create / read.  When communities are managing files that are terabytes in size, they will want to perform data subsetting operations at the remote storage location, and avoid moving the entire file whenever possible.

Thus we see the need to support:
  • Partial I/O.  We need to seek and read from an offset

Partial IO for registryObjects or RepositoryItems is not currently supported. We can file Enhancement Requests in JIRA for these. I can see adding query or request parameters to specify control over partial IO. This is quite independent from CQL or Image profiles. May I suggest that you start a separate thread with title "Need Partial IO support in RegRep protocols" and in that thread give some simple examples. We can work to refine it further together.

  • Metadata extraction.  We need to read metadata from structure header.

This is essentially what Cataloger Plugins are expected to do:

Please study that and tell me if we need some Enhancement Requests in that capability. Please keep that discussion on a separate thread title "Metadata Extraction" or some such.

Lets keep all spec discussions on the TC mailing list. Lets keep implementation discussions off list.

The NetCDF and HDF5 libraries support multiple simultaneous operations, minimizing the number of control messages sent over the network.

Reagan

From: Farrukh Najmi <farrukh@wellfleetsoftware.com>
Date: Monday, August 13, 2012 12:14 PM
To: Reagan Moore <rwmoore@renci.org>
Cc: ebXML Regrep <regrep@lists.oasis-open.org>
Subject: Re: [regrep] Proposal for CQL Profile and Image Profile


The RegRep Core spec defines a fixed set of operations (CREATE/READ/UPDATE/DELETE) on an unbounded universe of resource types.
RegRep profile specs tend to be in one of the following categories (non-exhaustive):
  • Profiles based on specific resource types (e.g. Image, WSDL, XML Schema, Web Ontology / OWL, ...)
  • Profiles that extend the protocol (e.g. a REST profile extension that adds CREATE/UPDATE/DELETE to the READ only REST protocol defined in ebRS today (BTW, this is another one I plan to propose soon)
  • Profiles that specify an extension other than protocol extension (e.g. CQL Profile defines how to use a specific query _expression_ language)
You seem to be suggesting that there is a need for profiles based on operations. This is not the design center of RegRep though a protocol extension comes closest to it. Can you give an example to illustrate the need.

Based on what you have shared thus far I can see the following repository specific profiles as possible examples of what you had in mind:
  • RegRep repository profile for NetCDF
  • RegRep repository profile for HDf5
  • RegRep repository profile for OGC WFS
  • RegRep repository profile for OGC WMS
  • RegRep repository profile for OGC WCS

Am I getting close to what you are trying to communicate as a need?

On 08/13/2012 11:53 AM, Reagan Moore wrote:
The question is how do I go from "file type" specific queries to "functional type" specific queries.  Instead of choosing what to do based on the file type, choose what to do based on the operation that is being performed.

The digital library world tried to specify data management as a function of the type of data object.  They went through the process of managing:
  • Books
  • Chapters in books
  • Arbitrary collections of digital objects
They had to revise their organizing schema at each transition, and now have explicit structure metadata in METS.

A similar transition can occur when querying objects:
  • Query Dublin core metadata
  • Query against defined schema
  • Query for data type properties (images, metadata may be encapsulated in the image)
  • Query for metadata associated with an object (HDF5, NetCDF, FITS, Dicom).  This requires extracting metadata from an object different from an image.
Will you generate different profiles for each each type of object?
Or will you generate profiles for each type of operation?

Reagan

From: Farrukh Najmi <farrukh@wellfleetsoftware.com>
Date: Monday, August 13, 2012 11:12 AM
To: "regrep@lists.oasis-open.org" <regrep@lists.oasis-open.org>
Subject: Re: [regrep] Proposal for CQL Profile and Image Profile


Reagan,

The two proposed specs do not define any new protocols or messages. The CQL profile specifies how the existing CQL specification is used within the existing RegRep protocols. The Image Profile defines how to use existing RegRep features for publishing, cataloging and querying data when said data is image data and how to use the proposes CQL profile to query image.

At first glance, the question below seem to be in the implementation space rather than specification space (which is the purview of the RegRep TC).

From the implementors perspective the implementation space main impact would be in how one implements the communication between the registry and the remote repositories it manages whether they be HDF5, NetCDF or OGC services.

On 08/13/2012 10:53 AM, Reagan Moore wrote:
Farrukh:
Classes of operations we support when accessing remote files include:
  • Posix I/O (open, close, read, write, seek, stat, ….)
  • HDF5 libraries for subsetting files and accessing metadata
  • NetCDF libraries for subsetting files and accessing metadata
  • Open Geospatial Consortium protocol
These libraries are applied at the storage location where the file resides.  The goal is to minimize the amount of data sent over a network.  We have user communities that manipulate files that are up to 2.5 Terabytes in size.

This implies there are a wide variety of operations that may need to be applied when accessing data.  Will there be a way to translate from your protocol specification for operations to  the above examples?

Reagan Moore

From: Farrukh Najmi <farrukh@wellfleetsoftware.com>
Date: Saturday, August 11, 2012 2:00 PM
To: Reagan Moore <rwmoore@renci.org>
Cc: ebXML Regrep <regrep@lists.oasis-open.org>
Subject: Re: [regrep] Proposal for CQL Profile and Image Profile

Reagan,

Thank you for getting the technical discussion started on the proposed new extension specs. I am copying the list on my response.

On 08/10/2012 04:31 PM, Reagan Moore wrote:
I have a full day faculty meeting on August 17.

In that case I suggest we defer our meeting for another two weeks and meet on Friday August 31, 2012 at 12pm ET and discuss the proposal over email in the interim as you have done below. Does that work you and other colleagues?



We find the need for multiple types of queries when dealing with scientific data:
  • spatial queries, find an item within a bounding box

This is planned to be supported. See the following query example in the ImageProfile wiki page:

  • Find images by GeoLocation:

    exif.geoLocation WITHIN POLYGON((59 22, 78 22,78 38, 59 38, 59 22))
The spec will support all spatial relations (not just WITHIN) and all geometry types not just POLYGON as defined by Egenhofer Spatial Relations.

  • Temporal queries, find an event within a time period

This is planned to be supported. See the following query example in the ImageProfile wiki page:

  • Find by creation date:

    exif.dateTime >= "2008-07-13T21:05:34"

  • Logical queries ("and" and "or")
Will the approach be extensible to allow operations beyond =, <, > ?

CQL defines a decent core set of relations as part of the CQL context and allows any new relations to be defined by additional contexts so the approach is completely extensible. Also, we will define the use of regrep query functions within the CQL query like any other query so this would be above and beyond the extensibility offered by CQL itself.

I will update the wiki pages sometime to make sure above responses are clearly called out.

So what do you and other colleagues think about these two profiles so far?


Reagan

From: Farrukh Najmi <farrukh@wellfleetsoftware.com>
Date: Friday, August 10, 2012 1:04 PM
To: ebXML Regrep <regrep@lists.oasis-open.org>
Subject: [regrep] Proposal for CQL Profile and Image Profile


Dear Colleagues,

I have been working on supporting the publish, management and discovery of image content (such as files with format of JPEG, PNG, GIF etc.) in an ebXML RegRep.
I would like to discuss ideas and initial thoughts on a extension profile specification with the proposed title "ebXML RegRep Profile for Image Resources".

During the course of this work I found that unlike other profiles, the Image Profile requires a much larger number of metadata attributes by which to allow searching for image resources. These metadata attributes are defined by specifications such as [EXIF-2.2] and [IPTC-2008]. I also found that many of these attributes are numeric and search needs to support finding matches to numeric attributes such as imageWidth and imageLength to be within a certain min/max range. These unique differences made it rather unwieldy to define a canonical parameterized query to discover image resources by. There would be just too many parameters in the traditional approach.

The existing canonical Adhoc Query would be the solution to this problem. However, RegRep core does not define a standard query language syntax that could allow using the Adhoc Query in an interoperable manner across registry implementations. One registry could support SQL query while another could support XQUery and the query schema could be quite different even for the same query language.

This led me to consider the CQL [SearchRetrievePt5] as a implementation neutral query language syntax for querying image profile data as well as any other profile's data and core ebrim RegistryObject metadata. The basic idea is to define a CQL context set for each such profile. The context set defines a set of indexes that can be used in searching data for that context. For example, in Image Profile you can have exif.imageWidth and exif.imageHeight indexes that allow searching for images by their pixels height and width. The index definition would also specify how the index relates to RegistryObjects for that type of data (e.g. images). Information would include data type for each index as well as a set of relations that could be used (e.g. "=", "<", ">" etc...).

Here are some examples for CQL queries that could be used to search for image content. A query could contain any number of predicates combined using boolean operators like AND, OR.
  • Find by creation date: exif.dateTime >= "2008-07-13T21:05:34"
  • Find images where width and height are both >= 300 pixels: exif.imageWidth >= 300 AND exif.imageHeight >= 300
  • Find by f-number: exif.fNumber >= 2.8
  • Find images by GeoLocation: exif.geoLocation WITHIN POLYGON((59 22, 78 22,78 38, 59 38, 59 22))
  • Find by creator: iptc.creator.name = "*farrukh*najmi*"
  • Find by genre: iptc.intellectualGenre = "wildlife"
The use of CQL queries can be specified using the existing RegRep Adhoc Query protocol with a new CQL query language. All this is proposed to be defined a extension profile specification with the proposed title "ebXML RegRep Profile for Contextual Query Language (CQL)". No protocol changes or changes to RegRep core would be needed to support CQL as an extension profile.

The outline for both proposed extension specs are available in our wiki here for your review:
I would like to propose that we meet on our next TC meeting on August 17, 2012 at 12PM ET to discuss these two proposals.

Please let me know (off list) if there is any chance that you will be unable to attend and please share your thoughts on this email thread until our meeting. Thank you.


-- 
Regards,
Farrukh Najmi

Web: http://www.wellfleetsoftware.com



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]