OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

uima message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]

Subject: UIMA TC Status Update DRAFT

Hi All,

Here's a draft status update on the activities of the UIMA Technical Committee.

Please review for our call tomorrow.

Usual CALL-IN Information:

US Toll free: 1-866-398-2885 US Toll: 1-719-457-6209 Passcode: 125054
Local Numbers Outside US: Local - Germany, Frankfurt: +49 (0) 69 36507 2086 UK, London: +44 (0) 20 7663 2216 Japan, Tokyo: +81 (0) 3 4455 1256  France, Paris: +33 (0) 1 73 04 14 06
Singapore: +65 6419 3726

Status Update
We set out to consider a proposal, drafted by IBM, for a  platform-independent specification for text and multi-modal analytics based on UIMA.  
Our charter was to review, elaborate and refine these elements of the specification:

1. CAS Specification: An object-oriented representation for a stand-off model of annotations over multi-modal (text, audio, video) data that admits multiple representations of the same artifact (e.g., different translations of a document).  We view a CAS as an object graph, where objects may "annotate" regions of an artifact representation (e.g., text document). We chose XMI as a standard XML-based representation for a CAS since XMI is as standard mechanism for representing object graphs in XML. The CAS is used to exchange annotation data between applications of analytics.

2. Type-System Language: A representation language for defining annotations or other meta-data types whose instances would be represented in a CAS. Essentially a language for defining CAS schemas. We aligned this language with ecore which is a variant of OMG EMOF standard and used to define object-models.

3. Type-System Base Model: A  basic domain-independent types that would be considered standard across all domain or application-specific type-systems.

4. Abstract Interfaces:  Platform-Independent description of a base set of analytic interfaces for building unstructured information analysis applications. These interfaces would ingest and emit CAS's to perform basic functions like the control, annotation, segmentation or merging of artifact and artifact metadata.

5. Behavioral Meta-Data Specification: A feature-based representation for specifying an analytics preconditions and post-conditions over a CAS type-system. These may be used by applications or frameworks to validate the input and output of analytics or to discover analytics based on declarative statements describing their behavior.

6. WSDL Service Descriptions:  Web Service Descriptions of the Abstract Interfaces defined above. This is used for implementing the abstract interfaces as  web-based services (for example using SOAP). We are currently reviewing this section.

5. Processing Element Meta-data Specification: A XML-based representation for specifying meta-data about an analytic (e.g., identification and configuration parameters). To Review.

8. Aggregate Analytic Descriptor Specification: A declarative XML-based representation for specifying the composition of analytics to produce  aggregate analytics. To Review.

9. SOAP Bindings: Bind the WSDL service descriptions for the Abstract Interfaces to  the SOAP protocol. To Review.

We have identified and resolved open issues and proposed modifications and refinements in the form of "modification documents" to specification elements 1-5 above.

We will publish a "Preliminary Specification Summary" within the next 3 months, by October 2007.
This draft will focus on the core specification elements mentioned above and integrate the results from all TC meetings and modification documents up to the end of August 2007. It will not include use-cases, APACHE UIMA notes, or interoperability case studies. It will focus solely on describing the specification as it has been defined by the end of August. It will note remaining open-issues.

We plan to complete modification documents for all remaining sections  by Dec 2007.

We plan to complete a full draft of the specification by end of  February 2008 and publish a final specification by 2Q 2008.


David A. Ferrucci, PhD
Senior Manager, Semantic Analysis & Integration
Chief Architect,  UIMA
IBM T.J. Watson Research Center
19 Skyline Drive, Hawthorne, NY 10532
Tel: 914-784-7847, 8/863-7847

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]