topicmaps-comment message

Subject: Re: [xtm-wg] The Topic Naming Constraint
From: "Steven R. Newcomb" <srn@coolheads.com>
To: xtm-wg@yahoogroups.com
Date: Fri, 23 Feb 2001 18:49:27 -0600
[Geir Ove Gronmo:]

> First of all I'd like to say that I wholeheartedly agree with Lars
> Marius on this matter. The Topic Naming Constraint is [extremely]
> painful and too awkward - in my eyes it is a real show-stopper.

> I'll try to explain what I think is the problem with draconian Topic
> Naming Constraint enforcement, which the standard(s) at this point
> requires from an XTM processor. A conformant XTM processor has to
> enforce these constraint, e.g. there cannot be _any_ exceptions if
> you'd like to claim that your processor is conformant.

> I see the usefulness of being able to do automatic namespace-based
> merges. This is extremely useful, but unfortunately the problem is
> that people (myself included) believe that base names are intended for
> labelling purposes, not for identification.

I'm baffled by your argument, Geir Ove.  You seem to be saying that we
should do away with XTM's ability to support string-match-based topic
merging because you don't agree that that is the purpose of baseName
elements.

Would you be happier if we re-named the <baseNameString> element type
to <string-for-string-based-topic-merging-and-for-no-other-purpose>?
If we had not already decided on the interchange syntax for XTM, I
could live with that.  (You'll have to forgive me if I just so happen
to put the names of my topics in
<string-for-string-based-topic-merging-and-for-no-other-purpose>
elements.)

If you want to do away with the topic naming constraint, please don't
confuse what's left with the topic map paradigm, and don't imagine
that what's left will allow topics to be addressed by their names.

> Applications need to be able to display labels for topics. The
> obvious way to do this is to use names.

Not true.  A name is useful for addressing.  A label is not useful for
addressing; it is merely a display convenience.  Contrary to what you
seem to believe, labels should not always also be names.

> Occurrences are usually out of the
> question, since _generic_ applications wouldn't be able to know when
> to use a basename and when to display occurrences.

Not true.  The processing of XTM <topicMap> elements is exactly what
the XTM Spec says it is, no more and no less.  If the XTM Spec defines
a published subject that is "label", then that's all that's necessary.
If an occurrence type is "label" (or any other XTM-defined occurrence
type), generic applications will be required to know what that means,
and to act accordingly.  A label is a kind of occurrence, no more and
no less.

> I believe that it is a problem [actually a bug] that base names are
> subject for namespace-merges. As I said, I believe that
> namespace-merges are _extremely_ useful. But we should not use base
> names for this.
> 
> Proposal:
> 
>   o We need another kind of name, e.g. identifying-name. (I don't
>   think that it makes sense for an identifying name to have variants
>   like basenames do.)

You're proposing to redefine the meaning of the <baseName> generic
identifier, simply because you think it ought to be called something
else?  I hope and believe it's too late for that; I think we've
already put the syntax of XTM to bed.  You're also proposing to blur
the distinction between names and labels, so that you can have a
<name-that-is-not-really-a-name> and a <name-that-is-really-a-name>.
I think you're swimming upstream, here.  The vocabulary of informatics
is already well-established, especially in the context of the Web,
where the word "name" has semi-religious significance in which a name
is an address, or at least part of an address within some larger
context (namespace).

>   A name targeted towards namespace-based merges complements subject
>   indicators. Instead of just identifying a topic by subject
>   indicators (the _meaning_ of the resource content) you're then also
>   able to identify a topic by the name resource _content_
>   (byte-by-byte).

Right.  But what's wrong with calling the element type that contains
the string to be matched <baseNameString>?  It's consistent with years
of popular usage.  

> Here are some thoughts on why the TNC doesn't work in real-life in its
> current incarnation.
> 
> Problem:
> 
>   Topics are merged even though the author(s) didn't intend them to
>   merge because it is known by the authors that they have different
>   subjects. This is what I believe is the main problem with the TNC in
>   real life.

That can only happen if the authors don't know what <baseName> means.
I guess you're saying that authors don't know what <baseName> means.
Whose fault is that?  It's not the fault of the XTM Spec, nor of ISO
13250, both of which are crystal-clear about this.

> I see at least four ways that a processor can behave when merging two
> or more topic maps:
> 
>   o The merge happens automatically and parts of the topic map no
>     longer makes sense (is inconsistent), because resulting topics
>     now represents more than one subject.
> 
>   o The merge is interactive. Unfortunately this requires the
>     author(s) to be present when the merge happens. This is
>     unacceptable in most cases. You cannot expect the authors to be
>     present when a merge happens. Note that there are many [an
>     unlimited number of] reasons why a merge happens.
> 
>   o The processor marks the topics as to-be-merged. This prevents the
>     merged topic map to be presented to the user[!]. (A non-consistent
>     topic map cannot be presented to the user by a conforming
>     processor.)
> 
>   o The processor doesn't do merging based on the TNC. This is not
>     allowed by the standard.

There is something fundamentally amiss in your underlying assumptions
here.  You seem to be saying that merging happens without somebody
taking responsibility for the merge.  Neither 13250 nor the XTM Spec
says or implies any such thing as automatic unattended merging of
arbitrary sets of topic maps.  Both of the standards are strictly
limited to saying how a *single* topic map document (or, in the case
of XTM, a single <topicMap> element) should be interpreted.

So, if you want to declare, in an XTM-conforming fashion, that two
topic maps must be merged, the *only* way you can do that is to write
a third <topicMap> that contains a <mergeMap> for the one, and another
<mergeMap> for the other.  Please note: when you write this third
topic map, you are, by definition, a topic map author, you are
creating a topic map, and you are responsible for the sensibility of
what it says.  Neither 13250 nor XTM say anything about the
methodologies whereby topic maps should be (or can be) created.  Both
standards only say what a topic map must be interpreted to mean after
it has been created.

In light of these facts, let's discuss your points one by one:

> I see at least four ways that a processor can behave when merging two
> or more topic maps:

There is only one way an XTM-conforming processor can behave when
merging topic maps.  Otherwise, there is no point in having an
XTM Specification.

>   o The merge happens automatically and parts of the topic map no
>     longer makes sense (is inconsistent), because resulting topics
>     now represents more than one subject.

If this happens, it is the fault of the person who wrote the topic map
that caused the inappropriate merging to occur.  It is always the
responsibility of the topic map author to write a topic map that makes
sense.  (There is an unbounded number of ways in which to create a
nonsensical topic map, and there is no way for the Spec to prevent
that.  Indeed, the power to make sense always includes the power to
make nonsense.)

>   o The merge is interactive. Unfortunately this requires the
>     author(s) to be present when the merge happens. This is
>     unacceptable in most cases. You cannot expect the authors to be
>     present when a merge happens. Note that there are many [an
>     unlimited number of] reasons why a merge happens.

A topic map always has an author.  It is the author's responsibility
that it makes sense.  If the authoring process includes some
interactive procedure, that's just fine, but the Spec does not specify
any such interactive procedure.  Vendors like Ontopia can invent and
implement such interactive procedures, of course, and I hope they
will!  But the person engaging in such an interactive procedure is, by
definition, a topic map author.

Also, contrary to what you say, according to the XTM Spec, there is an
extremely small and finite number of reasons why merging occurs:

(a) If by "merge" you mean "the merging of topic maps", there is only
    one reason: the existence of a <mergeMap> element in the
    <topicMap> element being processed.

(b) If by "merge" you mean "the merging of topics", there are only two
    reasons:

    (1) The topics have the same name in the same topic namespace,
        and/or

    (2) The topics share one or more subject identity points.

>   o The processor marks the topics as to-be-merged. This prevents the
>     merged topic map to be presented to the user[!]. (A non-consistent
>     topic map cannot be presented to the user by a conforming
>     processor.)

Here, you're thinking about the problems of implementing some
software.  The Spec does not forbid you to write any software.  You
are free to write and license any software you like.  If, among the
other features of such software, the software can by used to fully
understand XTM <topicMap> elements in the manner set forth in the XTM
Spec, then that feature is XTM-conforming.

The fundamental purpose of the Spec is to describe a limited,
implementable set of functionalities that all XTM-conforming software
must implement, with respect to XTM-conforming <topicMap> elements.
It is emphatically *not* the purpose of the Spec to limit the set of
functionalities that any software *may* implement.  There is no limit
in that regard at all.  There is certainly no limit with respect to
what software may do with information that does *not* happen to be
XTM-conforming <topicMap> elements.

>   o The processor doesn't do merging based on the TNC. This is not
>     allowed by the standard.

You're oversimplifying the case.  Let's be very clear about this.  The
processor must do everything the Spec requires, including support for
the topic naming constraint, if and only if *all* of the following
things are true:

(1) the processor claims to have the ability to support XTM-conformant
    processing, and

(2) the processor is being used in its XTM-conformant mode by its user
    (because it may, of course, have other modes of operation that are
    not claimed to be XTM-conforming and which are therefore not
    constrained by the Spec), and

(3) the processor is processing a <topicMap> element that claims to be
    XTM-conformant.


> - - -
> 
> It has been pointed out that one of the reason why the base name
> constraint exist is to avoid ambiguities when presented with identical
> names. I agree with the usefulness of being able to avoid ambiguities.
> 
> Something to be aware of is that a name can be disambiguated by a
> processor even without looking at the name scope:
> 
>   o A basename belongs to a topic, which itself represents a
>     subject. The subject and subject descriptors _disambiguates_ the
>     name!

What you're saying here is that if we allow several topics to have the
same name in the same namespace, there's no problem because we can
just go look at all the topics that have the same name in the same
namespace to see which one is the one we're looking for.

Of course, what you're saying is true.  On the other hand, this
methodology offers nothing we don't already have in typical search
engines.  Your proposal is tantamount to proposing that users, rather
than computers, should be required to sort through the infoglut on
their own behalf.

>   o The type-hierarchy and the classes of which the topic is an
>     instance describes what the topic is about and that should to some
>     extent disambiguate the name.

This is the same specious argument you've already made; you're just
pointing out some ways that, after we have established a list of
topics to look at, we users can distinguish between them.  It's still
a "back to infoglut" argument.  I'm not buying it.

>   o Basically all the other characteristics can be used by the XTM
>     processor to further disambiguate a name to the end-user.

No, in this scenario, the XTM processor isn't doing the
disambiguation.  The end-user is doing the disambiguation, based on
what the XTM processor is able to tell the user.  It's infoglut all
over again, only worse, because now the user has to establish the
relative importance of each aspect of everything when deciding what to
look at.

Your proposal would compromise the ability of the topic map paradigm
to enhance the productivity of humanity, by taking away from the topic
map author the ability to precalculate the relevance of materials on
behalf of the end user, in a way that the end-user can simply rely on
and use, without having to understand it.

My own proposal is radically different:

I propose that we use the topic map paradigm as it was designed to be
used, as a solid platform that allows domain experts to objectify
their expertise in ways that can maximally enhance the productivity of
non-domain-experts.

> - - -
> 
> Why is the TNC awkward?
> 
> 1. It is impossible to universally scope basenames at the time of
>    authoring to avoid unintended merges to happen in the future.

Simply not true.  When authoring, you can scope any <baseName> any way
you want.  There is no such thing as an "unintended" merge.  As an
author, you and you alone decide which topics will be merged, and
which will not be merged, and whether the merged topics will be merged
on the basis of common identity points, common names in namespaces, or
both.

>    You cannot know at a given point in time that you'll never have
>    unintended merges caused by the Topic Naming Constraint.

Who is the "you" in the above sentence?  If "you" are authoring a
topic map, "you" are responsible for the merging that occurs, because
"you" specified the topic map in such a way that merging occurs.  This
isn't "awkward"; it's essential that you say what you mean to say, and
that you can rely on the fact that everyone who uses the topic maps
that you've created will understand them to mean exactly what you said
that they meant, when they are interpreted as the Spec requires them
to be interpreted.

> 2. Most merges will be done automatically by a computer (without user
>    intervention).
> 
>    You cannot expect the authors of the two topic maps to be present
>    the merge happens.

There is confusion here, because "merging" means two different things,
and the distinction between them has been blurred.

(1) "Automatic merging of topics."  This is the merging of topics that
    is required to occur during conforming processing of a *single*
    <topicMap> element, which may or may not contain <mergeMap>s.
    
(2) "The authoring of a <topicMap> that may or may not contain
    <mergeMap>s."  This is topic map authoring.  When the author
    decides to publish his <topicMap> element, he is taking
    responsibility for the fact that, when the <topicMap> element is
    processed by an XTM-conforming processor, exactly and only the
    merging that he intends and believes to be appropriate and correct
    will occur.

> 3. A computer cannot automatically correctly and sensibly scope names
>    to avoid the TNC.

Right.  I agree with you.  In general, computers are still lousy
authors.

> - - -
> 
> Conclusion:
> 
>    o Get rid of the TNC and introduce a separate content-based
>      identifying name.

This is a terrible idea.  I'd put it in the same category as, "We
don't want to bother with topic maps, because Microsoft Help, with its
powerful full-text searching capabilities, already does everything
anybody really needs."  (I mention this particular howler because a
real customer has actually said this me.)

I've just re-read this note, and I've just realized that you seem to
assume that any topic map should merge automatically with any other
topic map, or perhaps that the paradigm is designed to make this
possible.  Just in case this is what you're thinking, let me assure
you that this is nonsense.  It is also untrue that the topic naming
constraint exists in order to make it possible to merge arbitrary sets
of topic maps automatically.

In my own view, the most persuasive reason for the existence of the
combination of the two merging rules (the Name-based and Subject-based
merging rules) is to make it economically feasible for Party C to
maintain a topic map that merges Party A's and Party B's topic maps,
even though the latter pair of topic maps are evolving separately, in
total ignorance of one another, and even though one or both of them is
not rationally maintaining the syntactic addressibility of their XML
element components.  The ability to address topics rigorously, by
means of their names, is critical to the economic feasibility of Party
C's business model.  Yes, Party C has to work hard to keep up with the
changes to Party A's and Party B's topic maps (it can never be a fully
automatic process) but at least Party C doesn't have to start from
scratch every time Party A or Party B releases a new version of their
topic maps.  Party C can address Party A's topics and Party B's topics
by their names, if desired, and/or by their subject identity points.
Using the two kinds of addressing in combination is extremely powerful.

The economic feasibility of Party C's business model is critical to
the promulgation of global knowledge interchange.  People like Party C
have to be able to make a profit from integrating the knowledge of
people like Party A and Party B, if the dream of global knowledge
interchange is to be realized without having us all drown in global
knowledge glut.

-Steve

--
Steven R. Newcomb, Consultant
srn@coolheads.com

voice: +1 972 359 8160
fax:   +1 972 359 0270

405 Flagler Court
Allen, Texas 75013-2821 USA

------------------------ Yahoo! Groups Sponsor ---------------------~-~>
eGroups is now Yahoo! Groups
Click here for more details
http://us.click.yahoo.com/kWP7PD/pYNCAA/4ihDAA/2n6YlB/TM
---------------------------------------------------------------------_->

To Post a message, send it to:   xtm-wg@eGroups.com

To Unsubscribe, send a blank message to: xtm-wg-unsubscribe@eGroups.com 

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
Follow-Ups:
- Re: [xtm-wg] The Topic Naming Constraint
  - From: Geir Ove Gr�nmo <grove@ontopia.net>
- Re: [xtm-wg] The Topic Naming Constraint
  - From: Geir Ove Gr�nmo <grove@ontopia.net>
- Re: [xtm-wg] The Topic Naming Constraint
  - From: Geir Ove Gr�nmo <grove@ontopia.net>
References:
- Re: [xtm-wg] Topic Naming Constraint question
  - From: Sam Hunting <sam_hunting@yahoo.com>
- Re: [xtm-wg] Topic Naming Constraint question
  - From: Lars Marius Garshol <larsga@garshol.priv.no>
- [xtm-wg] The Topic Naming Constraint
  - From: Geir Ove Gr�nmo <grove@ontopia.net>