The RegRep Core spec defines a fixed set of operations
(CREATE/READ/UPDATE/DELETE) on an unbounded universe of resource
types.
RegRep profile specs tend to be in one of the following categories
(non-exhaustive):
- Profiles based on specific resource types (e.g. Image, WSDL,
XML Schema, Web Ontology / OWL, ...)
- Profiles that extend the protocol (e.g. a REST profile
extension that adds CREATE/UPDATE/DELETE to the READ only REST
protocol defined in ebRS today (BTW, this is another one I
plan to propose soon)
- Profiles that specify an extension other than protocol
extension (e.g. CQL Profile defines how to use a specific
query _expression_ language)
You seem to be suggesting that there is a need for profiles based
on operations. This is not the design center of RegRep though a
protocol extension comes closest to it. Can you give an example to
illustrate the need.
Based on what you have shared thus far I can see the following
repository specific profiles as possible examples of what you had
in mind:
- RegRep repository profile for NetCDF
- RegRep repository profile for HDf5
- RegRep repository profile for OGC WFS
- RegRep repository profile for OGC WMS
- RegRep repository profile for OGC WCS
Am I getting close to what you are trying to communicate as a
need?
On 08/13/2012 11:53 AM, Reagan Moore wrote:
The question is how do I go from "file type" specific queries
to "functional type" specific queries. Instead of choosing what
to do based on the file type, choose what to do based on the
operation that is being performed.
The digital library world tried to specify data management as
a function of the type of data object. They went through the
process of managing:
- Books
- Chapters in books
- Arbitrary collections of digital objects
They had to revise their organizing schema at each
transition, and now have explicit structure metadata in METS.
A similar transition can occur when querying objects:
- Query Dublin core metadata
- Query against defined schema
- Query for data type properties (images, metadata may be
encapsulated in the image)
- Query for metadata associated with an object (HDF5, NetCDF,
FITS, Dicom). This requires extracting metadata from an
object different from an image.
Will you generate different profiles for each each type of
object?
Or will you generate profiles for each type of operation?
Reagan
Reagan,
The two proposed specs do not define any new protocols or
messages. The CQL profile specifies how the existing CQL
specification is used within the existing RegRep
protocols. The Image Profile defines how to use existing
RegRep features for publishing, cataloging and querying
data when said data is image data and how to use the
proposes CQL profile to query image.
At first glance, the question below seem to be in the
implementation space rather than specification space
(which is the purview of the RegRep TC).
From the implementors perspective the implementation space
main impact would be in how one implements the
communication between the registry and the remote
repositories it manages whether they be HDF5, NetCDF or
OGC services.
On 08/13/2012 10:53 AM, Reagan Moore wrote:
Farrukh:
Classes of operations we support when accessing
remote files include:
- Posix I/O (open, close, read, write, seek, stat, ….)
- HDF5 libraries for subsetting files and accessing
metadata
- NetCDF libraries for subsetting files and accessing
metadata
- Open Geospatial Consortium protocol
These libraries are applied at the storage location
where the file resides. The goal is to minimize the
amount of data sent over a network. We have user
communities that manipulate files that are up to 2.5
Terabytes in size.
This implies there are a wide variety of operations
that may need to be applied when accessing data. Will
there be a way to translate from your protocol
specification for operations to the above examples?
Reagan Moore
Reagan,
Thank you for getting the technical discussion
started on the proposed new extension specs. I am
copying the list on my response.
On 08/10/2012 04:31 PM, Reagan Moore wrote:
I have a full day faculty meeting on August
17.
In that case I suggest we defer our meeting for
another two weeks and meet on Friday August 31, 2012
at 12pm ET and discuss the proposal over email in
the interim as you have done below. Does that work
you and other colleagues?
We find the need for multiple types of
queries when dealing with scientific data:
- spatial queries, find an item within a
bounding box
This is planned to be supported. See the following
query example in the ImageProfile wiki page:
- Find images by
GeoLocation:
exif.geoLocation WITHIN POLYGON((59 22, 78 22,78
38, 59 38, 59 22))
The spec will support all spatial relations (not
just WITHIN) and all geometry types not just POLYGON
as defined by
Egenhofer
Spatial Relations.
- Temporal queries, find an event within a
time period
This is planned to be supported. See the following
query example in the ImageProfile wiki page:
- Find by creation date:
exif.dateTime >= "2008-07-13T21:05:34"
- Logical queries ("and" and "or")
Will the approach be extensible to allow
operations beyond =, <, > ?
CQL defines a decent core set of relations as part
of the CQL context and allows any new relations to
be defined by additional contexts so the approach is
completely extensible. Also, we will define the use
of regrep query functions within the CQL query like
any other query so this would be above and beyond
the extensibility offered by CQL itself.
I will update the wiki pages sometime to make sure
above responses are clearly called out.
So what do you and other colleagues think about
these two profiles so far?
Reagan
Dear Colleagues,
I have been working on supporting the
publish, management and discovery of image
content (such as files with format of JPEG,
PNG, GIF etc.) in an ebXML RegRep.
I would like to discuss ideas and initial
thoughts on a extension profile
specification with the proposed title "ebXML
RegRep Profile for Image Resources".
During the course of this work I found that
unlike other profiles, the Image Profile
requires a much larger number of metadata
attributes by which to allow searching for
image resources. These metadata attributes
are defined by specifications such as [ EXIF-2.2]
and [ IPTC-2008].
I also found that many of these attributes
are numeric and search needs to support
finding matches to numeric attributes such
as imageWidth and imageLength to be within a
certain min/max range. These unique
differences made it rather unwieldy to
define a canonical parameterized query to
discover image resources by. There would be
just too many parameters in the traditional
approach.
The existing canonical
Adhoc Query would be the solution to
this problem. However, RegRep core does not
define a standard query language syntax that
could allow using the Adhoc Query in an
interoperable manner across registry
implementations. One registry could support
SQL query while another could support XQUery
and the query schema could be quite
different even for the same query language.
This led me to consider the CQL [ SearchRetrievePt5]
as a implementation neutral query language
syntax for querying image profile data as
well as any other profile's data and core
ebrim RegistryObject metadata. The basic
idea is to define a CQL context set for each
such profile. The context set defines a set
of indexes that can be used in searching
data for that context. For example, in Image
Profile you can have exif.imageWidth and
exif.imageHeight indexes that allow
searching for images by their pixels height
and width. The index definition would also
specify how the index relates to
RegistryObjects for that type of data (e.g.
images). Information would include data type
for each index as well as a set of relations
that could be used (e.g. "=", "<", ">"
etc...).
Here are some examples for CQL queries that
could be used to search for image content. A
query could contain any number of predicates
combined using boolean operators like AND,
OR.
- Find by creation date: exif.dateTime
>= "2008-07-13T21:05:34"
- Find images where width and height are
both >= 300 pixels: exif.imageWidth
>= 300 AND exif.imageHeight >= 300
- Find by f-number: exif.fNumber >=
2.8
- Find images by GeoLocation:
exif.geoLocation WITHIN POLYGON((59 22,
78 22,78 38, 59 38, 59 22))
- Find by creator: iptc.creator.name =
"*farrukh*najmi*"
- Find by genre: iptc.intellectualGenre
= "wildlife"
The use of CQL queries can be specified
using the existing RegRep Adhoc Query
protocol with a new CQL query language. All
this is proposed to be defined a extension
profile specification with the proposed
title "ebXML RegRep Profile for Contextual
Query Language (CQL)". No protocol changes
or changes to RegRep core would be needed to
support CQL as an extension profile.
The outline for both proposed extension
specs are available in our wiki here for
your review:
I would like to propose that we meet on our
next TC meeting on August 17, 2012 at 12PM
ET to discuss these two proposals.
Please let me know (off list) if there is
any chance that you will be unable to attend
and please share your thoughts on this email
thread until our meeting. Thank you.
--
Regards,
Farrukh Najmi
Web: http://www.wellfleetsoftware.com
--
Regards,
Farrukh Najmi
Web: http://www.wellfleetsoftware.com
|