OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

uima message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [uima] [Type System Base Model subgroup]


I agree that if we are going to provide support for such structures, we
should provide support for the most general of such structures, e.g. Graphs.
Since Trees are a special case of Graphs, this covers trees and other
relational structures.  Constraining this to DAGs probably makes sense,

I do understand your concern about object proliferation in the proposal I
circulated.  I also understand that "sometimes we want to express the region
which a node governs".  However, I think while the latter point will be very
common for parse trees (where e.g. a single parse tree corresponds to a
contiguous span like a sentence) it is not generally true of these sorts of
hierarchical Annotation collections and by subtyping Annotation to add the
graph support, we would be requiring AnnotationGraphs to be tied to specific
sofa spans.  I'm imagining use cases for the AnnotationGraphs to include
things like the collection of co-references corresponding to a particular
entity, which will clearly not correspond to contiguous spans.  Of course,
if individual Annotations can have non-contiguous spans this is less of a
problem and subtyping could work.  Anyone else have a preference for a
particular design?

Actually, I should modify my proposal not to explicitly encode "parent" and
"child" but rather to simply represent inlinks and outlinks; that way we can
leave the semantics of the DAG up to the specific use case.

I do see your point about unidirectional representation.  However, this
would require traversal of these structures to always proceed bottom-up in
the case of trees, which might not be natural.

Good point about validation of the structures -- to what extent is the
framework responsible for checking/enforcing constraints on structures in
other areas?

Karin

On 3/4/07 2:23 PM, "KANO, Yoshinobu" <kano@is.s.u-tokyo.ac.jp> wrote:

> Here is a refined version of my proposal about the tree or graph structures,
> "Open issue 9." of the latest mail from Karin.
> This issue is not fully discussed in the previous telecon,
> because we didn't have much time.
> 
> 
> * Purpose of this proposal
> 
> The purpose of this proposal is to provide references
> which have a limited use for the widely used specific data structures,
> trees or/and DAGs, with least cost of efficiency.
> 
> These structures can be used to explicitly express syntactic trees,
> shared-node multiple trees, etc.
> 
> 
> * Problems in the current specification/implementation
> 
> Suppose an UIMA component which generates CFG parse trees.
> For example, a CFG tree may contain Annotations of "phrases" and "words",
> and these Annotations have a specific type of syntactic relations
> between them.
> 
> We cannot express unary relations with just begin/end pairs,
> so we must use "references" between annotations to express relations.
> 
> Because the "reference" defined in the current specification is generic,
> we don't have a mean to tell the class of the information structures
> to other UIMA components.
> 
> 
> * Classes of structures
> 
> A basic restriction to the references is not to contain any cycles.
> It is really useful to assure that a set of references are acyclic.
> 
> [Tree] Tree structures, acyclic and only a single parent is permitted.
> [DAG] Directed Acyclic Graphs, acyclic and multiple parents are permitted.
> 
> My proposal is mainly to provide tree structures,
> because structures are essentially skeleton trees
> with additional links between nodes, in most cases in the NLP field.
> 
> But I think it is also important to provide DAGs.
> For example, we can naturally express multiple tree structures
> (like many parse candidates) by a single DAG.
> In this case trees can share nodes and we can avoid the combinatorial
> explosion.
> 
> * Type hierarchy and View-like new type
> 
> The main advantage of the <AnnotationGraphNode> (View-like new type)
> may be that it can separate the structure and leaf Annotations explicitly.
> But the separation of the structure and Annotations generates
> more objects, i.e. decrease the efficiency.
> On the other hand, sometimes we want to express the region which a node
> governs.
> For these reasons I would prefer to define a class under <Annotation>.
> In this case, I think it is better to provide the references to
> roots, leaves, or entire collection of nodes as an option.
> 
> If we provide both [Tree] and [DAG], the type hierarchy is:
>    <Annotation> -subtype-> <Tree Node>,
>    <Annotation> -subtype-> <DAG Node>.
> 
> It seems to be natural to define them as:
>    <Annotation> -subtype-> <DAG Node> -subtype-> <Tree Node>,
> because trees are also DAGs.
> But <DAG Node> will have more features(references) than <Tree Node>,
> so it is better not to inherit <DAG Node>.
> 
> If the case is [Tree] only:
>    <Annotation> -subtype-> <Tree Node>,
> 
> 
> * Expression of structures
> 
> Candidates of "references" are "parent(s)" and "children".
> If we do not provide "children" references,
> then the order of children will be resolved by begin/end positions.
> It is easy because Annotations are currently sorted by begin/end.
> 
> a. unidirectional
> 
> [Tree] It is better to use "parent", not "children" as references
> because we can explicitly limit the number of parents to one.
> 
> [DAG] "parents" or "children".
> 
> b. bidirectional
> 
> I don't support bidirectional references
> because there can be miss-linked parent-child pairs,
> though it is convenient to provide bidirectional references.
> There is another problem in efficiency that the memory requirement
> increases.
> 
> 
> * Validation of structures
> 
> I think it is not good for the efficiency if the UIMA system always check
> the structures whether they are trees/DAGs or not.
> But it is also annoying to check that in the component side.
> 
> My proposal is to provide a check system, and users can switch on/off
> to perform the validations by the component descriptor or something.
> I'm not sure how much should these things be included in the specification.
> 
> In the case of bidirectional references, one may have a need to validate
> whether the parent-child relations correctly match or not.
> This is another reason why I don't support the bidirectional references.
> 
> 
> 
> * Summary
> 
> I propose to define <TreeNode> and <DAGNode> types under <Annotation>.
> In my opinion, references are unidirectional:
> "parent" for [Tree], "parents" or "children" for [DAG].
> A structure validation system must be provided as an optional one.
> 
> Options:
> 1. Provide a [Tree] structure specific type or not
> 2. Provide a [DAG] structure specific type or not
> 
> 
> best,
> 
> Yoshinobu KANO
> kano@is.s.u-tokyo.ac.jp
> Tsujii Lab., the University of Tokyo

_______________________________________________________________
Karin Verspoor, Computational Linguist
Knowledge and Information Systems Science team
Computer, Computation & Statistics division
http://public.lanl.gov/verspoor
email: verspoor@lanl.gov   Mail: Los Alamos National Laboratory
phone: 505-667-5086              PO Box 1663, MS B256
fax:   505-667-1126              Los Alamos, NM 87545
_______________________________________________________________





[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]