Dear Dr. Moore,
Thank you for bringing the requirements of Data Grids to the
attention of the RegRep TC. Thanks also for suggesting the
possibility of having ebXML RegRep serve as a client interface to
the
iRODS Data Grid
implementation.
We believe that ebXML RegRep standard is very flexible and
extensibile and can provide an open standards-based interface and
information model to clients of iRODS-based Data Grids. We find
the suggestion compelling due to the powerful combination of the
highly robust and scalable iRODS software at the back end coupled
with the simple, extensible and open ebXML RegRep interface on the
front end.
We also, agree with your assessment that there are likely to be
some challenges in defining a mapping between ebXML RegRep and
iRODS due to some semantic differences and functionality gaps in
the ebXML RegRep specification.
More in-depth technical analysis and discussions needs to take
place in order for us to identify specific functionality gaps in
the ebXML RegRep specification that stand in the way of meeting
the requirements for serving as a client interface to iRODS. We
invite you to come and formally join the ebXML RegRep TC as a
member and help us identify such potential issues in our
specifications so that they can be individually tracked in our
issue tracker
and addressed in a future version of the specification.
Once again, thank you for your thoughtful comment and suggestion.
We look forward to engaging with you in depth to identify and
address specific issues so that ebXML RegRep specification may
serve as the public client interface for iRODS-based Data Grids.
On 05/18/2012 11:01 AM, Reagan Moore wrote:
Data grids implement the ability to submit, query and
retrieve the contents of a registry and repository. An example
is the integrated Rule Oriented Data System, iRODS, available as
open source software at
The iRODS software has been under development since 2006 in
projects funded by the National Science Foundation and the
National Archives and Records Administration. It incorporates
registry and repository management functions that were first
implemented in the Storage Resource Broker that was developed
between 1996 and 2005.
The iRODS software is used to support data sharing
environments, digital libraries, archives, and repositories.
Examples include French National Library, Australian Research
Collaboration Service (national data grid), CyberSKA radio
astronomy data, National Optical Astronomy Observatory data
grid, genomics data grids (Wellcome Trust Sanger Institute,
Broad Institute), satellite data (NASA Center for Climate
Simulations), Ocean Observatories Initiative sensor data, EUDAT
data replication, etc.
Some of the challenges that are faced when managing petabytes
of internationally distributed data containing hundreds of
millions of files include:
- managing interactions with heterogeneous storage systems
(Windows, Mac, Unix file systems, tape archives, web sites,
databases)
- enforcing assertions about collection properties (policy
enforcement through a distributed rule engine)
- automating administrative functions (migration,
replication, integrity checking, metadata loading)
- providing efficient data transport mechanisms
- supporting the wide variety of clients requested by user
communities (web browsers, web services, load libraries, I/O
libraries, file system interfaces, workflows, dropbox style
synchronization, digital libraries, portals, webDav, grid tools,
Unix tools, etc.)
The capabilities supported by iRODS include:
- submission of files into a repository
- management of descriptive metadata, system metadata,
provenance metadata for files, users, storage systems
- queries on metadata, browsing on files
- registration of files from remote systems, web sites,
archives
- data management functions such as replication, aggregation,
distribution, caching
- policy enforcement for domain specific requirements (access
controls, derived data product generation, automated metadata
extraction, data processing, etc.)
Given a well defined API, it is possible to port the ebXML
access mechanisms on top of the iRODS data grid. The major
concern is that the ebXML protocol is a constrained subset of
the operations required by the above listed projects.
Reagan Moore
DICE Center
UNC-CH