uima message

Subject: Re: [uima] "Assignment Style" vs. "Functional Style"

From: Thilo W Goetz <TGOETZ@de.ibm.com>
To: Adam Lally <alally@us.ibm.com>
Date: Wed, 9 May 2007 19:10:48 +0200

To my mind, the data in the CAS is just that: data. That data is much more like the data in a database than objects in a programming language. So I wouldn't even call this approach "functional style", it's more like "data in, data out", or passing a reference to a database. If I have a sentence annotation, for example, over a stretch of text, I have no notion of that annotation as an object; rather, there is a piece of information that tells me that there is a sentence from x to y in my input text.

I would not like to change this model for the following reasons:

Functionality: there may very well be service implementations that can not guarantee a procedural behavior, for whatever reason. I think we shouldn't disallow such services a priori in the spec.

Performance: I would not like to see applications that don't need this behavior having to pay the price for it. We would need to create a mapping from CAS internal object IDs to XMI IDs on serialization, keep it around. Then create a similar mapping on the service side on deserialization and keep it around. When the service serializes the CAS again it needs to respect the mapping, and the application that receives the results from the service will in turn need to respect its mapping when deserializing.

Not enforceable: all a service needs to do to be compliant is to not reuse any input IDs (unless I missed something). I'm assuming that we're only talking about FSs that are indexed, since we're freely dropping non-indexed FSs on serialization, anyway.

Implementation: the weakest argument of all, but I would like to keep the ability to, for example, compact the CAS heap between calls to processors. If I need to keep references into the heap intact (without knowing who's holding them), I can't do anything like that. No garbage collection, no paring down of the CAS because only a small subset of the data is needed for further processing etc. We haven't really implemented any of that anyway, but if we wanted to implement features like that, requiring referential integrity from the outside, so to speak, would make this very hard or impossible.

Mit freundlichen Gruessen / Best regards

Thilo Goetz
OmniFind & UIMA development
Information Management Division
IBM Germany
+49-7031-16-1758

IBM Deutschland Entwicklung GmbH
Vorsitzender des Aufsichtsrats: Johann Weihen
Geschäftsführung: Herbert Kircher
Sitz der Gesellschaft: Böblingen
Registergericht: Amtsgericht Stuttgart, HRB 243294

Adam Lally <alally@us.ibm.com>

04/30/07 15:57

To	carl.madson@sri.com, j.tsujii@manchester.ac.uk, Sophia.ananiadou@manchester.ac.uk, "uima@lists.oasis-open.org" <uima@lists.oasis-open.org>
cc
Subject	[uima] "Assignment Style" vs. "Functional Style"

Hi everyone,

I tried to capture the issues we discussed on the last telecon regarding whether UIMA's component/service interfaces should use an assignment style (take a CAS and update it) or a functional style (take a CAS and return a new CAS). See the attached document.

Thilo and/or Thomas probably have more that they would like to add to this discussion. Anyone else is of course free to jump in as well.

Regards,
-Adam
_____________________________
Adam Lally
Advisory Software Engineer
UIMA Framework Lead Developer
IBM T.J. Watson Research Center
Hawthorne, NY, 10532
Tel: 914-784-7706, T/L: 863-7706
alally@us.ibm.com[attachment "AssignmentVsFunctionalStyle.doc" deleted by Thilo W Goetz/Germany/IBM]

Follow-Ups:
- Re: [uima] "Assignment Style" vs. "Functional Style"
  - From: Adam Lally <alally@us.ibm.com>