OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

uima message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [uima] UIMA Abstract Interface Styles - Constant IDs or Non-constantIDs



Thanks Adam.

Allow me to add to Adam's comment on the object database analogy.

The spirit of adopting  Object-Oriented representations and  UML, XMI-related standards for representing a CAS Type-System draws a very strong and explicit alignment with the object-oriented paradigm where object identify is a corner stone concept.

I am at a loss at this point for appreciating the value of loosing the object id information across a services' interface. At one point there was an efficiency argument but I believe that was debunked for all practical purposes by Adam's experiment (in previous email)

I also want to emphasize the "debugging" scenario Adam noted below. This has come up in other discussions as well and I believe deserves serious consideration.  Consider  building tools for tracing and debugging the operations on CASs. I think  this would be a lot harder without the notion of constant object identify.

Finally I think delta-out interfaces are also a very important consideration given the efficiency they can afford.  I believe we all agreed on this. Support for this appears to essentially entail the notion of "constant ids".

Comments?

I would like to get some feedback or more detailed arguments from the team on the specific points raised so that we feel confident in developing an informed consensus on this issue and move on.


-Dave
------------------------------------------------------------------------
David A. Ferrucci, PhD
Senior Manager, Semantic Analysis & Integration
Chief Architect,  UIMA
IBM T.J. Watson Research Center
19 Skyline Drive, Hawthorne, NY 10532
Tel: 914-784-7847, 8/863-7847
ferrucci@us.ibm.com
------------------------------------------------------------------------
http://www.ibm.com/research/uima  



Adam Lally/Watson/IBM@IBMUS

05/17/2007 09:54 AM

To
<uima@lists.oasis-open.org>, <carl.madson@sri.com>, <j.tsujii@manchester.ac.uk>, <Sophia.ananiadou@manchester.ac.uk>
cc
Subject
[uima] UIMA Abstract Interface Styles - Constant IDs or Non-constant IDs






Hi everyone,


It seems to me that there are three basic styles of interface we could have to a UIMA Analytic Service:


1) CAS-in, delta out

The service doesn't respond with a CAS, but instead a set of instructions for updating the input CAS.  So if an input CAS  contained two Person objects with xmi:ids 1 and 2, such as:

<xmi:XMI>

 <example:Person xmi:id="1" firstName="Joe"/>

 <example:Person xmi:id="2" firstName="Jane"/>

</xmi:XMI>


The service might respond with an instruction (syntax TBD) saying: "Update the object with xmi:id 1 by setting firstName = 'Bob'".


2) CAS-in, CAS-out, with constant IDs

The service responds with an entire CAS.  If objects in both the input and output CASes have the same xmi:id, they are considered to be the same object.  Objects in the output CAS that have new IDs are considered new objects.  So in the above example the service would respond with a CAS containing something like:

<xmi:XMI>

 <example:Person xmi:id="1" firstName="Bob"/>

 <example:Person xmi:id="2" firstName="Jane"/>

</xmi:XMI>


3) CAS-in, CAS-out, without constant IDs

The service responds with an entire CAS, but the xmi:ids in this CAS bear no relation to the ids in the input CAS, so in the simple example the following would be an acceptable response:

<xmi:XMI>

 <example:Person xmi:id="1" firstName="Jane"/>

 <example:Person xmi:id="2" firstName="Bob"/>

</xmi:XMI>




I believe that 1 and 2 are functionally equivalent.  In case #1 you could apply the deltas to the original input CAS and will end up with the same CAS as would have been returned by the service in case #2. Likewise it is easy to compare the input and output CASes from case #2 and recover the delta.  However in case #3 there is no unique delta (there could have been be one modification, or two, or perhaps a delete and a create occurred).


I realize that in this very simple example it may not seem to matter whether you can compute a unique delta.  However with more realistic, large CASes, without deltas or constant IDs it becomes much more difficult to recover the information about what operations a particular service performed on the CAS.


There are at least two uses cases where it is useful to know what operations a service performed on a CAS.  One is parallel processing, where we'd like to invoke multiple service on identical copies of a CAS and then merge the results.  Another is debugging.  It is very useful to be able to compare the CAS before and after a service call, and discover what that service has changed in the CAS.  This is significantly easier to do with deltas or constant IDs than with non-constant IDs.


-Adam



P.S.  I investigated the possible analogy of a CAS as an "object database", and I found this in the manual for the Versant object database (http://www.versant.com/developer/resources/objectdatabase/documentation/database_fund_man.pdf):


"One of the strongest concepts in object technology is object identity, because it makes possible such features as persistent references to other objects and the ability to migrate objects among distributed databases without having to change code that accesses the objects.  Versant assigns each persistent object a unique identifier called its logical object identifier or loid.  Logical object identifiers are composed of two parts: a database identifier and an object identifier."


So one thing this brings up is that a "CAS ID" may be important for us to have (analogous to a database identiifer).  But my main point is about that first phrase - one of the strongest concepts in object technology is object identity.  I think if UIMA is at all considered "object technology" then we should have a clear notion of object identity.


_____________________________
Adam Lally
Advisory Software Engineer
UIMA Framework Lead Developer
IBM T.J. Watson Research Center
Hawthorne, NY, 10532
Tel: 914-784-7706,  T/L: 863-7706
alally@us.ibm.com



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]