topicmaps-comment message

Subject: Re: [xtm-wg] Topic Naming Constraint
From: Peter Newcomb <peter@techno.com>
To: xtm-wg@egroups.com
Date: Mon, 15 Jan 2001 13:28:26 -0600
[Kal Ahmed <kal@ontopia.net> on Mon, 15 Jan 2001 10:20:04 -0000]
> I would like to express my concerns about and objection to the Topic
> Naming Constraint expressed in the XTM 1.0 specification. Having
> worked with both the ISO 13250 specification and XTM 1.0
> specification and implemented programming libraries for both of
> these specifications, I find the topic naming constraint to be an
> unecessary restriction which makes the creation of consistent,
> mergeable topic maps exceedingly difficult in any but the most
> restricted situations. My objections are four-fold and I will
> attempt to express them here.
>
> 1. In my mind, the most important objection is that NAME and
> IDENTITY are two orthogonal concepts.

First of all, topic names that are subject to the topic naming
constraint (i.e., basenames) are not the only type of name that may be
assigned to topics.  In fact, basenames are a very special type of
name that exists, in combination with scope, solely to provide a
name-based identity mechanism for topics.

So, basenames exist as a _result_ of the recognition of the fact that
while NAME and IDENTITY are technically orthogonal, NAMEs can be, and
commonly are used to _help_ resolve IDENTITY.

> There is no way in which a name should be construed as asserting
> identity.

Surely, we all use names every day to establish the identity of things
and concepts as we communicate with each other.  This is possible
because:

  a) within a particular conversation, there is generally a shared
     context with respect to which such names can be resolved; and

  b) as humans, we have the remarkable, but not infallible ability to
     recognize and correct resolution errors when they occur.

> Both ISO 13250 and XTM 1.0 recognise the orthogonality of these two
> concepts by providing separate constructs for each.

In fact, ISO 13250 and XTM 1.0 both provide constructs for:

  1) assigning names to topics that have no bearing on identity;

  2) establishing identity of topics independently of their names; and

  3) assigning names to topics that unambiguously identify them with
     respect to particular contexts.

> Unfortunately the topic naming constraint then smashes the two
> concepts together again making a scoped name into a form of identity
> for a topic.

This would be true only if it were not possible to assign names to
topics that are not subject to the topic naming constraint.  As it is
possible to do this (just create another "naming" association type), I
don't see the validity of this complaint.

> 2. For the creator of a topic map, the topic naming constraint
> requires apriori knowledge of the vocabulary of all topic maps with
> which the topic map being created is used.  In a certain situations,
> this is possible - e.g.  the controlled vocabularies used by
> technical documentation departments; the controlled set of medical
> terms defined by WHO. However, in the general 'use-on-the-web'
> scenario, controlled vocabularies are not likely to prove practical
> and the topic naming constraint in effect restricts the author's
> freedom to name topics as he/she sees fit.

This would be true, were it not for scope and the ability to adjust
the scope of "included" topic maps (using addthms or mergeMap).  With
these, the creator of a topic map need only ensure that basenames are
used consistently within that topic map; it is the responsibility of
the creators of topic maps that use that topic map to scope the
included topic map relative to their own topic maps such that
inappropriate mergings do not occur.  Remember, it is always easy to
scope an "included" topic map such that it effectively has its own
namespace, i.e., so that none of the basenames assigned within it will
ever conflict with any assigned outside of it.  The nice thing about
it is that in the cases where relatively controlled vocabularies are
used, the care that went into creating and goes into maintaining them
can be rewarded by the ease of merging and the increased
interoperability of topic maps that use those vocabularies.

> 3. The topic naming constraint requires that a user has access to
> the content of potential merged topic maps plus specialised topic
> map processing in order to determine if the creation or modification
> of a topic will cause a merge to take place.  With subject-based
> merging, standard string manipulation tools will work if the user
> has access to the potential merged topic maps and using
> authoritative subject identities could even mitigate the need for
> access to the potential merged topic maps.

See my answer to point number (2) with respect to the perceived need
to have access to topic maps with which this map may potentially be
merged.

As far as the use of "standard string manipulation tools" versus
"specialized topic map processing", I don't understand your point.  It
seems to me that in order to use "standard string manipulation tools"
to do even simple things with topic maps, you must use them in very
specific and relatively complex ways--specific and complex to the
point of needing to be scripted.  By the time you've done that, you've
created a "specialized topic map processing" tool.

Along those lines, one problem with using only subject-based merging
is that mergings of topics can then be done in only three ways:

  1) one-by-one by human specialists,

  2) via identification with published subjects, or

  3) automatically, or more likely semi-automatically, using AI tools

(1) is basically the state of index merging today, though it will
likely become both easier and more difficult given the potential
richness of the information provided by topic maps in contrast to that
provided by traditional indexes.

(2) is a good mechanism for re-using the results of (1), but is
limited by the need for centralized authority.  In practice, there
will be more than one such authority, and those authorities and
authority-to-authority mapping services will have to use (1) or (3) in
order to provide interoperability.  Also, (2) requires topic subjects
to be registered with one of these authorities and published before it
can be used to merge topics with those subjects.  This potentially
includes the subjects of _all_ topics: relying too heavily on (2) will
send the number of subjects that must be so registered sky-rocketing,
and will exceed the abilities of the authorities to collect and
publish them.

(3) requires even more specialized and complex tools than the
"specialized topic map processing" you envision above.

What basenames (with the topic naming constraint) provide is a fourth
option:

  4) semi-automatically using (standard, simple) set-manipulation
     tools

(4) builds on the strengths of (2), by using published subjects to
provide a foundation that grounds the identities of other topics,
minimizing the need for one-by-one pinpointing of subjects by humans.
(2) thus becomes much more manageable, since authorities need only
collect and publish the foundational subjects.

> 4. Translation becomes difficult. Not all translations between
> languages are one-to-one, two concepts with distinct names which are
> considered distinct in one language may be translated to a single
> name in another language. So a translation from one language to
> another may potential cause topic merging not intended by the
> creator of the source topic map.

It is appropriate to use language to scope basenames, thus effectively
giving each language its own namespace.  Again, however, language
scope need not be applied to a topic map document unless it already
contains basenames in more than one language, or until it is
"included" by another topic map document that contains basenames in
more than one language.

> 5. Reification becomes problematic. It would be impossible for two
> reified topic map objects to share the same scoped name. For example
> I may wish to reify an occurrence of a topic which represents 'John
> Smith' and give it a name 'Photograph of John Smith'. This means
> that for any other topic about any other John Smith, I must be sure
> not to use the string 'Photograph of John Smith' to name an
> occurrence or else the a-nodes representing the *occurrences* (not
> the topics!) will be merged.

First, any two topic map objects may share a name in the same scope,
just not a "basename" name.  (See my discussion of point (1) above.)

Second, if you really do want to use "basename" (because you intend to
establishing name-based identities for the topics), then it is (or
should be) a reportable error to apply the same name within the same
scope to topics with different addressable subjects, or to two topics
whose subjects are known to be different (e.g., two association topics
with different members, where the members are known to be different
because they have different addressable subjects).  These sorts of
errors can tell topic map creators that either different names should
be chosen, or more likely that the scopes within which the name is
being assigned to the topics should be differentiated.

> For these reasons I propose the removal of the topic naming
> constraint from the XTM 1.0 processing model and urge the authoring
> group participating members to seriously consider and openly discuss
> this proposal.

I agree that this point needs to be discussed, though (in case you
couldn't tell from my comments) I don't agree with the proposal.
There is far too much confusion on these points, and it is important
that the members (participating and otherwise) of the AG be clear and
of one mind about them.

-peter

--
Peter Newcomb                           Epremis Corporation
peter.newcomb@epremis.com               http://www.epremis.com/

To Post a message, send it to:   xtm-wg@eGroups.com

To Unsubscribe, send a blank message to: xtm-wg-unsubscribe@eGroups.com
Follow-Ups:
- RE: [xtm-wg] Topic Naming Constraint
  - From: "Kal Ahmed" <kal@ontopia.net>
References:
- [xtm-wg] Topic Naming Constraint
  - From: "Kal Ahmed" <kal@ontopia.net>