uima message

Subject: Re: [uima] [Type System Base Model subgroup]

From: "KANO, Yoshinobu" <kano@is.s.u-tokyo.ac.jp>
To: Thilo W Goetz <TGOETZ@de.ibm.com>,Karin Verspoor <verspoor@lanl.gov>,"uima@lists.oasis-open.org" <uima@lists.oasis-open.org>
Date: Sat, 10 Mar 2007 02:02:28 +0900

Karin Verspoor wrote:
 > Actually, I should modify my proposal not to explicitly encode 
"parent" and
 > "child" but rather to simply represent inlinks and outlinks; that way 
we can
 > leave the semantics of the DAG up to the specific use case.
 >
 > I do see your point about unidirectional representation.  However, this
 > would require traversal of these structures to always proceed 
bottom-up in
 > the case of trees, which might not be natural.

If "AnnotationGraphNode" can refer to itself (or any EObject) as a
"nodeData",
then the number of the objects will not grow,
just the number of the links have effect to the spacial efficiency.
Is it possible to modify "nodeData" link as above?

Other issues in my previous mail can be represented
by extending "AnnotationGraphNode" in user side types
(e.g. we can define a class which extends both "AnnotationGraphNode" and 
"Annotation", if we adopt ECore as a type system language;
we can define a subtype of "AnnotationGraphNode" which restricts the 
usage of links),
and I agree that the approach with "AnnotationGraphNode" grasps
more generalized things.

Thilo W Goetz wrote:
 > It's these kinds of discussions that make me very suspicious of adding
 > more special-purpose types to the UIMA framework.  In particular, I
 > don't think that the standards group should be designing these top-down.
 >  I think it would be better to come up with a practical proposal for the
 > actual implementation, and see if other people adopt it.  If people seem
 > to converge on a common design for these things, then there's still time
 > to standardize them and put them in the OASIS work.

Currently, there are some options to express tree structures:

1. begin/end spans (OpenNLP Parser in the UIMA SDK examples,
though impossible to express the whole relations correctly),
2. parent references with a special feature name (Our group temporarily
took this approach for our implementation),
3. parents/children, or inlinks/outlinks (in the previous discussion of 
this mail-list),
etc.

Our group had already encountered interoperability problems about it.
Whenever we make an aggregation of a different component combination,
like replacing a syntactic parser in an aggregation,
we have to investigate what kind of type can be a part of a tree,
how they represent the structure (options 1-3 or else).

Such information may be described in some document if any,
because it cannot be included in any UIMA data.
But it is unclear where to search, whether it exists or not.
So I think it would be important to provide graph representation types
in the specification for the interoperability.

On the other hand, "reference" is the natural and (as far as I know)
only way to represent graphs, DAGs, trees in the UIMA framework.
Then "AnnotationGraphNode" would be the most generalized definition,
and we can directly map the definition into implementation.
If it is better to describe only the most generalized type in the 
specification,
the options would be narrowed down to define "AnnotationGraphNode" or not,
and I think it can be discussed in this committee.

best,
Yoshinobu

-- 
Yoshinobu KANO
kano@is.s.u-tokyo.ac.jp
Tsujii Laboratory, the University of Tokyo

Follow-Ups:
- Re: [uima] [Type System Base Model subgroup]
  - From: Karin Verspoor <verspoor@lanl.gov>

References:
- Re: [uima] [Type System Base Model subgroup]
  - From: Thilo W Goetz <TGOETZ@de.ibm.com>