Dear Colleagues,
As you know Reagan has joined RegRep TC and engaged with us in
depth regarding the issues posted in his
public
comment. However, we have the obligation to give a formal
response publicly on the regrep-comments list. Here is a
proposed response below. Please send your +1 or suggest
improvements (you too please, Reagan). Thank you.
<proposedResponse>
Dear Dr. Moore,
Thank you for bringing the requirements of Data Grids to the
attention of the RegRep TC. Thanks also for suggesting the
possibility of having ebXML RegRep serve as a client interface
to the
iRODS Data Grid
implementation.
We believe that ebXML RegRep standard is very flexible and
extensibile and can provide an open standards-based interface
and information model to clients of iRODS-based Data Grids. We
find the suggestion compelling due to the powerful combination
of the highly robust and scalable iRODS software at the back end
coupled with the simple, extensible and open ebXML RegRep
interface on the front end.
We also, agree with your assessment that there are likely to be
some challenges in defining a mapping between ebXML RegRep and
iRODS due to some semantic differences and functionality gaps in
the ebXML RegRep specification.
More in-depth technical analysis and discussions needs to take
place in order for us to identify specific functionality gaps in
the ebXML RegRep specification that stand in the way of meeting
the requirements for serving as a client interface to iRODS. We
invite you to come and formally join the ebXML RegRep TC as a
member and help us identify such potential issues in our
specifications so that they can be individually tracked in our
issue
tracker and addressed in a future version of the
specification.
Once again, thank you for your thoughtful comment and
suggestion. We look forward to engaging with you in depth to
identify and address specific issues so that ebXML RegRep
specification may serve as the public client interface for
iRODS-based Data Grids.
</proposedResponse>
On 05/18/2012 11:01 AM, Reagan Moore wrote:
Data grids implement the ability to submit, query and
retrieve the contents of a registry and repository. An
example is the integrated Rule Oriented Data System, iRODS,
available as open source software at
The iRODS software has been under development since 2006 in
projects funded by the National Science Foundation and the
National Archives and Records Administration. It incorporates
registry and repository management functions that were first
implemented in the Storage Resource Broker that was developed
between 1996 and 2005.
Re: [regrep-comment] Data Grids
The iRODS software is used to support data sharing
environments, digital libraries, archives, and repositories.
Examples include French National Library, Australian Research
Collaboration Service (national data grid), CyberSKA radio
astronomy data, National Optical Astronomy Observatory data
grid, genomics data grids (Wellcome Trust Sanger Institute,
Broad Institute), satellite data (NASA Center for Climate
Simulations), Ocean Observatories Initiative sensor data,
EUDAT data replication, etc.
Some of the challenges that are faced when managing
petabytes of internationally distributed data containing
hundreds of millions of files include:
- managing interactions with heterogeneous storage systems
(Windows, Mac, Unix file systems, tape archives, web sites,
databases)
- enforcing assertions about collection properties (policy
enforcement through a distributed rule engine)
- automating administrative functions (migration,
replication, integrity checking, metadata loading)
- providing efficient data transport mechanisms
- supporting the wide variety of clients requested by user
communities (web browsers, web services, load libraries, I/O
libraries, file system interfaces, workflows, dropbox style
synchronization, digital libraries, portals, webDav, grid
tools, Unix tools, etc.)
The capabilities supported by iRODS include:
- submission of files into a repository
- management of descriptive metadata, system metadata,
provenance metadata for files, users, storage systems
- queries on metadata, browsing on files
- registration of files from remote systems, web sites,
archives
- data management functions such as replication,
aggregation, distribution, caching
- policy enforcement for domain specific requirements
(access controls, derived data product generation, automated
metadata extraction, data processing, etc.)
Given a well defined API, it is possible to port the ebXML
access mechanisms on top of the iRODS data grid. The major
concern is that the ebXML protocol is a constrained subset of
the operations required by the above listed projects.
Reagan Moore
DICE Center
UNC-CH