Please see inline below...
On 08/13/2012 12:34 PM, Reagan Moore wrote:
Yes. The set of operations that are performed on the remote
object will be much more sophisticated than simply create /
read. When communities are managing files that are terabytes in
size, they will want to perform data subsetting operations at
the remote storage location, and avoid moving the entire file
whenever possible.
Thus we see the need to support:
- Partial I/O. We need to seek and read from an offset
Partial IO for registryObjects or RepositoryItems is not currently
supported. We can file Enhancement Requests in JIRA for these. I can
see adding query or request parameters to specify control over
partial IO. This is quite independent from CQL or Image profiles.
May I suggest that you start a separate thread with title "Need
Partial IO support in RegRep protocols" and in that thread give some
simple examples. We can work to refine it further together.
- Metadata extraction. We need to read metadata from
structure header.
This is essentially what Cataloger Plugins are expected to do:
Please study that and tell me if we need some Enhancement Requests
in that capability. Please keep that discussion on a separate thread
title "Metadata Extraction" or some such.
Lets keep all spec discussions on the TC mailing list. Lets keep
implementation discussions off list.
The NetCDF and HDF5 libraries support multiple simultaneous
operations, minimizing the number of control messages sent over
the network.
Reagan
The RegRep Core spec defines a fixed set of operations
(CREATE/READ/UPDATE/DELETE) on an unbounded universe of
resource types.
RegRep profile specs tend to be in one of the following
categories (non-exhaustive):
- Profiles based on specific resource types (e.g.
Image, WSDL, XML Schema, Web Ontology / OWL, ...)
- Profiles that extend the protocol (e.g. a REST
profile extension that adds CREATE/UPDATE/DELETE to
the READ only REST protocol defined in ebRS today
(BTW, this is another one I plan to propose soon)
- Profiles that specify an extension other than
protocol extension (e.g. CQL Profile defines how to
use a specific query _expression_ language)
You seem to be suggesting that there is a need for
profiles based on operations. This is not the design
center of RegRep though a protocol extension comes closest
to it. Can you give an example to illustrate the need.
Based on what you have shared thus far I can see the
following repository specific profiles as possible
examples of what you had in mind:
- RegRep repository profile for NetCDF
- RegRep repository profile for HDf5
- RegRep repository profile for OGC WFS
- RegRep repository profile for OGC WMS
- RegRep repository profile for OGC WCS
Am I getting close to what you are trying to communicate
as a need?
On 08/13/2012 11:53 AM, Reagan Moore wrote:
The question is how do I go from "file type" specific
queries to "functional type" specific queries. Instead
of choosing what to do based on the file type, choose
what to do based on the operation that is being
performed.
The digital library world tried to specify data
management as a function of the type of data object.
They went through the process of managing:
- Books
- Chapters in books
- Arbitrary collections of digital objects
They had to revise their organizing schema at each
transition, and now have explicit structure metadata in
METS.
A similar transition can occur when querying objects:
- Query Dublin core metadata
- Query against defined schema
- Query for data type properties (images, metadata may
be encapsulated in the image)
- Query for metadata associated with an object (HDF5,
NetCDF, FITS, Dicom). This requires extracting
metadata from an object different from an image.
Will you generate different profiles for each each
type of object?
Or will you generate profiles for each type of
operation?
Reagan
Reagan,
The two proposed specs do not define any new
protocols or messages. The CQL profile specifies
how the existing CQL specification is used within
the existing RegRep protocols. The Image Profile
defines how to use existing RegRep features for
publishing, cataloging and querying data when said
data is image data and how to use the proposes CQL
profile to query image.
At first glance, the question below seem to be in
the implementation space rather than specification
space (which is the purview of the RegRep TC).
From the implementors perspective the
implementation space main impact would be in how
one implements the communication between the
registry and the remote repositories it manages
whether they be HDF5, NetCDF or OGC services.
On 08/13/2012 10:53 AM, Reagan Moore wrote:
Farrukh:
Classes of operations we support when
accessing remote files include:
- Posix I/O (open, close, read, write, seek,
stat, ….)
- HDF5 libraries for subsetting files and
accessing metadata
- NetCDF libraries for subsetting files and
accessing metadata
- Open Geospatial Consortium protocol
These libraries are applied at the storage
location where the file resides. The goal is to
minimize the amount of data sent over a network.
We have user communities that manipulate files
that are up to 2.5 Terabytes in size.
This implies there are a wide variety of
operations that may need to be applied when
accessing data. Will there be a way to
translate from your protocol specification for
operations to the above examples?
Reagan Moore
Reagan,
Thank you for getting the technical
discussion started on the proposed new
extension specs. I am copying the list on
my response.
On 08/10/2012 04:31 PM, Reagan Moore
wrote:
I have a full day faculty meeting on
August 17.
In that case I suggest we defer our meeting
for another two weeks and meet on Friday
August 31, 2012 at 12pm ET and discuss the
proposal over email in the interim as you
have done below. Does that work you and
other colleagues?
We find the need for multiple types
of queries when dealing with scientific
data:
- spatial queries, find an item within
a bounding box
This is planned to be supported. See the
following query example in the ImageProfile
wiki page:
- Find images by
GeoLocation:
exif.geoLocation WITHIN POLYGON((59 22,
78 22,78 38, 59 38, 59 22))
The spec will support all spatial relations
(not just WITHIN) and all geometry types not
just POLYGON as defined by
Egenhofer
Spatial Relations.
- Temporal queries, find an event
within a time period
This is planned to be supported. See the
following query example in the ImageProfile
wiki page:
- Find by creation date:
exif.dateTime >=
"2008-07-13T21:05:34"
- Logical queries ("and" and "or")
Will the approach be extensible to
allow operations beyond =, <, > ?
CQL defines a decent core set of relations
as part of the CQL context and allows any
new relations to be defined by additional
contexts so the approach is completely
extensible. Also, we will define the use of
regrep query functions within the CQL query
like any other query so this would be above
and beyond the extensibility offered by CQL
itself.
I will update the wiki pages sometime to
make sure above responses are clearly called
out.
So what do you and other colleagues think
about these two profiles so far?
Reagan
Dear Colleagues,
I have been working on supporting
the publish, management and
discovery of image content (such as
files with format of JPEG, PNG, GIF
etc.) in an ebXML RegRep.
I would like to discuss ideas and
initial thoughts on a extension
profile specification with the
proposed title "ebXML RegRep Profile
for Image Resources".
During the course of this work I
found that unlike other profiles,
the Image Profile requires a much
larger number of metadata attributes
by which to allow searching for
image resources. These metadata
attributes are defined by
specifications such as [ EXIF-2.2]
and [ IPTC-2008].
I also found that many of these
attributes are numeric and search
needs to support finding matches to
numeric attributes such as
imageWidth and imageLength to be
within a certain min/max range.
These unique differences made it
rather unwieldy to define a
canonical parameterized query to
discover image resources by. There
would be just too many parameters in
the traditional approach.
The existing canonical
Adhoc Query would be the
solution to this problem. However,
RegRep core does not define a
standard query language syntax that
could allow using the Adhoc Query in
an interoperable manner across
registry implementations. One
registry could support SQL query
while another could support XQUery
and the query schema could be quite
different even for the same query
language.
This led me to consider the CQL [ SearchRetrievePt5]
as a implementation neutral query
language syntax for querying image
profile data as well as any other
profile's data and core ebrim
RegistryObject metadata. The basic
idea is to define a CQL context set
for each such profile. The context
set defines a set of indexes that
can be used in searching data for
that context. For example, in Image
Profile you can have exif.imageWidth
and exif.imageHeight indexes that
allow searching for images by their
pixels height and width. The index
definition would also specify how
the index relates to RegistryObjects
for that type of data (e.g. images).
Information would include data type
for each index as well as a set of
relations that could be used (e.g.
"=", "<", ">" etc...).
Here are some examples for CQL
queries that could be used to search
for image content. A query could
contain any number of predicates
combined using boolean operators
like AND, OR.
- Find by creation date:
exif.dateTime >=
"2008-07-13T21:05:34"
- Find images where width and
height are both >= 300
pixels: exif.imageWidth >=
300 AND exif.imageHeight >=
300
- Find by f-number: exif.fNumber
>= 2.8
- Find images by GeoLocation:
exif.geoLocation WITHIN
POLYGON((59 22, 78 22,78 38, 59
38, 59 22))
- Find by creator:
iptc.creator.name =
"*farrukh*najmi*"
- Find by genre:
iptc.intellectualGenre =
"wildlife"
The use of CQL queries can be
specified using the existing RegRep
Adhoc Query protocol with a new CQL
query language. All this is proposed
to be defined a extension profile
specification with the proposed
title "ebXML RegRep Profile for
Contextual Query Language (CQL)". No
protocol changes or changes to
RegRep core would be needed to
support CQL as an extension profile.
The outline for both proposed
extension specs are available in our
wiki here for your review:
I would like to propose that we meet
on our next TC meeting on August 17,
2012 at 12PM ET to discuss these two
proposals.
Please let me know (off list) if
there is any chance that you will be
unable to attend and please share
your thoughts on this email thread
until our meeting. Thank you.
--
Regards,
Farrukh Najmi
Web: http://www.wellfleetsoftware.com
|