uima message
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
| [List Home]
Subject: Abstract Interfaces Open Issues
- From: Adam Lally <alally@us.ibm.com>
- To: uima@lists.oasis-open.org
- Date: Mon, 26 Mar 2007 16:19:04 -0400
Hi,
In our last telecon we agreed the Abstract
Interfaces open issues should undergo further discussion. Let's see
if we can get some discussion going before the next call. Here's
my summary of what we discussed last time:
1) Analyzer Interface: should it be
able to process mutliple CASes in one call?
We dicsussed that there are two reasons
why we might want to allow this. First there is a performance argument:
in particular for remote services, it may be inefficient to send
each document as a separate request. Secondly there is the argument
that there might be an Analytic that needs to see a set of related CASes
in order to make a decision about how to annotate them.
I think we were in agreement that we
at least need to support sending multiple CASes for the performance reasons.
Possibly this can be pushed down to the concrete (SOAP, Java) bindings.
The idea of an Analytic operating on
a set of related CASes raises more questions. Do we then need a way
to declare this in the Analytic's Behavioral Metadata? This puts
a burden on the caller of figuring out what a valid set of CASes is for
this Analytic, otherwise it will not function properly. Also this
approach does not scale well - if the number of CASes in this logical set
is large, we may not be able to actually send them all in one call.
We noted that "CAS Consumer"
Analytics, which consider a set of CASes in order to update some aggregate
data structure, do not need to have all of the CASes passed to them in
one call. They can see them one at a time and keep state across process
calls. So a logical set of CASes needs to be passed only when the
results of the analysis are written back to those same CASes. Even
this case could be addressed with a two-pass flow: The FlowController
could send each CAS through the Analytic once allowing it to compile aggregate
statistics, and then send each CAS through again to allow the Analytic
to add annotations.
Below are the other issues in my summary
that we did not get a chance to discuss on the call. Comments appreciated.
2) [Box on pg. 62] Does the CAS Multiplier
interface need any/all of the following capabilities:
a) Return
more than one CAS at a time
b) Return
an indication that no more CASes are available now, but that the caller
should try back later. (The caller may specify the amount of time to wait
before returning.)
c) Return
an estimate of how many CASes have not yet been retrieved by the caller.
3) [Box on pg. 64] Flow Controller Interface:
a) Should
it be allowed to modify the CAS? (Currently whitepaper doesn't allow
it, but Apache UIMA implementation does.)
b) Should
the FlowController interface be kept simple (as in the UML diagram in figure
12) or be more like the Apache UIMA interface, or somewhere in between?
At the meta-level, to what degree do
these need to be specified in the Abstract Interfaces section, and what
amount of flexibility do we leave to specific bindings (concerete interfaces)?
This gets to the core question of the what exactly it means for an
implementation to comply with the Abstract Interfaces section.
Regards,
-Adam
_____________________________
Adam Lally
Advisory Software Engineer
UIMA Framework Lead Developer
IBM T.J. Watson Research Center
Hawthorne, NY, 10532
Tel: 914-784-7706, T/L: 863-7706
alally@us.ibm.com
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
| [List Home]