uima message
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
| [List Home]
Subject: UIMA TC Status Update DRAFT
- From: David Ferrucci <ferrucci@us.ibm.com>
- To: uima@lists.oasis-open.org
- Date: Thu, 2 Aug 2007 21:35:13 -0400
Hi All,
Here's a draft status update on the
activities of the UIMA Technical Committee.
Please review for our call tomorrow.
Usual CALL-IN Information:
US Toll free: 1-866-398-2885 US Toll:
1-719-457-6209 Passcode: 125054
Local Numbers Outside US: Local - Germany,
Frankfurt: +49 (0) 69 36507 2086 UK, London: +44 (0) 20 7663 2216 Japan,
Tokyo: +81 (0) 3 4455 1256 France, Paris: +33 (0) 1 73 04 14 06
Singapore: +65 6419 3726
Status Update
============
Review
------------
We set out to consider a proposal, drafted
by IBM, for a platform-independent specification for text and multi-modal
analytics based on UIMA.
Our charter was to review, elaborate
and refine these elements of the specification:
1. CAS Specification: An object-oriented
representation for a stand-off model of annotations over multi-modal (text,
audio, video) data that admits multiple representations of the same artifact
(e.g., different translations of a document). We view a CAS as an
object graph, where objects may "annotate" regions of an artifact
representation (e.g., text document). We chose XMI as a standard XML-based
representation for a CAS since XMI is as standard mechanism for representing
object graphs in XML. The CAS is used to exchange annotation data between
applications of analytics.
2. Type-System Language: A representation
language for defining annotations or other meta-data types whose instances
would be represented in a CAS. Essentially a language for defining CAS
schemas. We aligned this language with ecore which is a variant of OMG
EMOF standard and used to define object-models.
3. Type-System Base Model: A
basic domain-independent types that would be considered standard
across all domain or application-specific type-systems.
4. Abstract Interfaces: Platform-Independent
description of a base set of analytic interfaces for building unstructured
information analysis applications. These interfaces would ingest and emit
CAS's to perform basic functions like the control, annotation, segmentation
or merging of artifact and artifact metadata.
5. Behavioral Meta-Data Specification:
A feature-based representation for specifying an analytics preconditions
and post-conditions over a CAS type-system. These may be used by applications
or frameworks to validate the input and output of analytics or to discover
analytics based on declarative statements describing their behavior.
6. WSDL Service Descriptions: Web
Service Descriptions of the Abstract Interfaces defined above. This is
used for implementing the abstract interfaces as web-based services
(for example using SOAP). We are currently reviewing this section.
5. Processing Element Meta-data Specification:
A XML-based representation for specifying meta-data about an analytic (e.g.,
identification and configuration parameters). To Review.
8. Aggregate Analytic Descriptor
Specification: A declarative XML-based representation for specifying
the composition of analytics to produce aggregate analytics. To Review.
9. SOAP Bindings: Bind the WSDL
service descriptions for the Abstract Interfaces to the SOAP protocol.
To Review.
Status
---------
We have identified and resolved open
issues and proposed modifications and refinements in the form of "modification
documents" to specification elements 1-5 above.
We will publish a "Preliminary
Specification Summary" within the next 3 months, by October 2007.
This draft will focus on the core specification
elements mentioned above and integrate the results from all TC meetings
and modification documents up to the end of August 2007. It will not include
use-cases, APACHE UIMA notes, or interoperability case studies. It will
focus solely on describing the specification as it has been defined by
the end of August. It will note remaining open-issues.
We plan to complete modification documents
for all remaining sections by Dec 2007.
We plan to complete a full draft of
the specification by end of February 2008 and publish a final specification
by 2Q 2008.
-Dave
------------------------------------------------------------------------
David A. Ferrucci, PhD
Senior Manager, Semantic Analysis & Integration
Chief Architect, UIMA
IBM T.J. Watson Research Center
19 Skyline Drive, Hawthorne, NY 10532
Tel: 914-784-7847, 8/863-7847
ferrucci@us.ibm.com
------------------------------------------------------------------------
http://www.ibm.com/research/uima
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
| [List Home]