uima message

Subject: Re: [uima] "Assignment Style" vs. "Functional Style"

From: Adam Lally <alally@us.ibm.com>
To: Thilo W Goetz <TGOETZ@de.ibm.com>
Date: Wed, 9 May 2007 16:43:50 -0400

I ran a quick performance test with Apache UIMA's XMI serialization, which optionally supports maintaining xmi:id consistency across serializations. I don't show any significant performance difference. Basically, the serializer already has to create a mapping from CAS internal object IDs to XMI IDs during serialization, or else it could not serialize references to objects. The similar thing is true of the deserializer. So the only thing that had to be implemented for this was a way to share this mapping.

On the "non-enforceable" point, a service that doesn't reuse any input IDs would be considered to have deleted everything in the CAS and created a bunch of new stuff. In itself that doesn't make it non-compliant with UIMA, but it would need to declare this behavior in its metadata. A service which declares that it "modifies instances of type Foo" would not be allowed to change the xmi:id's on those instances or it would be considered to not comply with its own behavioral metadata.

However, perhaps this allows the service that cannot guarantee a procedural behavior to still be UIMA-compliant - it just has to declare its behavior appropriately. (Another possibility is that such a service is a CAS Multiplier. CAS Multipliers are expected to create completely new CASes and so might be a natural fit for this kind of service.)

Also I think if we say that there is no notion that an annotation is an object, then the TC needs to go back and revisit the earlier sections of the whitepaper which explicitly say that the CAS is an object graph, and revisit our decisions to use OMG standards which are fundamentally object-based.

Regards,
-Adam
_____________________________
Adam Lally
Advisory Software Engineer
UIMA Framework Lead Developer
IBM T.J. Watson Research Center
Hawthorne, NY, 10532
Tel: 914-784-7706, T/L: 863-7706
alally@us.ibm.com

Thilo W Goetz <TGOETZ@de.ibm.com>

05/09/2007 01:10 PM

To	Adam Lally/Watson/IBM@IBMUS
cc	carl.madson@sri.com, j.tsujii@manchester.ac.uk, Sophia.ananiadou@manchester.ac.uk, "uima@lists.oasis-open.org" <uima@lists.oasis-open.org>
Subject	Re: [uima] "Assignment Style" vs. "Functional Style"

To my mind, the data in the CAS is just that: data. That data is much more like the data in a database than objects in a programming language. So I wouldn't even call this approach "functional style", it's more like "data in, data out", or passing a reference to a database. If I have a sentence annotation, for example, over a stretch of text, I have no notion of that annotation as an object; rather, there is a piece of information that tells me that there is a sentence from x to y in my input text.

I would not like to change this model for the following reasons:

Functionality: there may very well be service implementations that can not guarantee a procedural behavior, for whatever reason. I think we shouldn't disallow such services a priori in the spec.

Performance: I would not like to see applications that don't need this behavior having to pay the price for it. We would need to create a mapping from CAS internal object IDs to XMI IDs on serialization, keep it around. Then create a similar mapping on the service side on deserialization and keep it around. When the service serializes the CAS again it needs to respect the mapping, and the application that receives the results from the service will in turn need to respect its mapping when deserializing.

Not enforceable: all a service needs to do to be compliant is to not reuse any input IDs (unless I missed something). I'm assuming that we're only talking about FSs that are indexed, since we're freely dropping non-indexed FSs on serialization, anyway.

Implementation: the weakest argument of all, but I would like to keep the ability to, for example, compact the CAS heap between calls to processors. If I need to keep references into the heap intact (without knowing who's holding them), I can't do anything like that. No garbage collection, no paring down of the CAS because only a small subset of the data is needed for further processing etc. We haven't really implemented any of that anyway, but if we wanted to implement features like that, requiring referential integrity from the outside, so to speak, would make this very hard or impossible.

Mit freundlichen Gruessen / Best regards

Thilo Goetz
OmniFind & UIMA development
Information Management Division
IBM Germany
+49-7031-16-1758

IBM Deutschland Entwicklung GmbH
Vorsitzender des Aufsichtsrats: Johann Weihen
Geschäftsführung: Herbert Kircher
Sitz der Gesellschaft: Böblingen
Registergericht: Amtsgericht Stuttgart, HRB 243294

Adam Lally <alally@us.ibm.com>

04/30/07 15:57

To	carl.madson@sri.com, j.tsujii@manchester.ac.uk, Sophia.ananiadou@manchester.ac.uk, "uima@lists.oasis-open.org" <uima@lists.oasis-open.org>
cc
Subject	[uima] "Assignment Style" vs. "Functional Style"

Hi everyone,

I tried to capture the issues we discussed on the last telecon regarding whether UIMA's component/service interfaces should use an assignment style (take a CAS and update it) or a functional style (take a CAS and return a new CAS). See the attached document.

Thilo and/or Thomas probably have more that they would like to add to this discussion. Anyone else is of course free to jump in as well.

Regards,
-Adam
_____________________________
Adam Lally
Advisory Software Engineer
UIMA Framework Lead Developer
IBM T.J. Watson Research Center
Hawthorne, NY, 10532
Tel: 914-784-7706, T/L: 863-7706
alally@us.ibm.com[attachment "AssignmentVsFunctionalStyle.doc" deleted by Thilo W Goetz/Germany/IBM]

Follow-Ups:
- Re: [uima] "Assignment Style" vs. "Functional Style"
  - From: Thilo W Goetz <TGOETZ@de.ibm.com>

References:
- Re: [uima] "Assignment Style" vs. "Functional Style"
  - From: Thilo W Goetz <TGOETZ@de.ibm.com>