topicmaps-comment message

Subject: RE: [xtm-wg] Topic Naming Constraint
From: "Kal Ahmed" <kal@ontopia.net>
To: <xtm-wg@egroups.com>
Date: Mon, 15 Jan 2001 22:27:36 -0000
> [Kal Ahmed <kal@ontopia.net> on Mon, 15 Jan 2001 10:20:04 -0000]
> > I would like to express my concerns about and objection to the Topic
> > Naming Constraint expressed in the XTM 1.0 specification. Having
> > worked with both the ISO 13250 specification and XTM 1.0
> > specification and implemented programming libraries for both of
> > these specifications, I find the topic naming constraint to be an
> > unecessary restriction which makes the creation of consistent,
> > mergeable topic maps exceedingly difficult in any but the most
> > restricted situations. My objections are four-fold and I will
> > attempt to express them here.
> >
> > 1. In my mind, the most important objection is that NAME and
> > IDENTITY are two orthogonal concepts.
>
> First of all, topic names that are subject to the topic naming
> constraint (i.e., basenames) are not the only type of name that may be
> assigned to topics.  In fact, basenames are a very special type of
> name that exists, in combination with scope, solely to provide a
> name-based identity mechanism for topics.
>

But <variant> is in the content model of <baseName> and its only purpose is
labelling.
The very fact that this is the case indicates that there was an intention
that baseName
not serve purely an identification mechanism. Equally, in the ISO spec, a
<basename>
also acts as a <dispname> and <sortname> if they are not provided, so it
achieves
functionality that strays into the territory of labelling too.

> So, basenames exist as a _result_ of the recognition of the fact that
> while NAME and IDENTITY are technically orthogonal, NAMEs can be, and
> commonly are used to _help_ resolve IDENTITY.
>
> > There is no way in which a name should be construed as asserting
> > identity.
>
> Surely, we all use names every day to establish the identity of things
> and concepts as we communicate with each other.  This is possible
> because:
>
>   a) within a particular conversation, there is generally a shared
>      context with respect to which such names can be resolved; and
>
>   b) as humans, we have the remarkable, but not infallible ability to
>      recognize and correct resolution errors when they occur.
>

Scoped names covers (a) but in certain (I would assert many, but thats
controversial) application areas, there will be either not enough time or
too much data for a human to do (b) and machines aren't smarter than humans.
Not yet.

> > Both ISO 13250 and XTM 1.0 recognise the orthogonality of these two
> > concepts by providing separate constructs for each.
>
> In fact, ISO 13250 and XTM 1.0 both provide constructs for:
>
>   1) assigning names to topics that have no bearing on identity;
>

Except for the minor syntactic niggle that I can't create a <dispname> for
an ISO13250 topic without creating a <basename> (even if it may be an empty
one).

>   2) establishing identity of topics independently of their names; and
>
>   3) assigning names to topics that unambiguously identify them with
>      respect to particular contexts.

But how do I ever get finished identifying the contexts within which two
subjects are
unambiguosly named ?

> > Unfortunately the topic naming constraint then smashes the two
> > concepts together again making a scoped name into a form of identity
> > for a topic.
>
> This would be true only if it were not possible to assign names to
> topics that are not subject to the topic naming constraint.  As it is
> possible to do this (just create another "naming" association type), I
> don't see the validity of this complaint.
>

No, the statement is still true:

XTM 4.7.1
"The <baseName> element specifies a topic name. A topic name is represented
by one string: the content of the <baseNameString> child of <baseName>. The
context within which a topic has a certain name may be specified by a child
<scope> element."

The rest of that section makes *no* reference to identity, except to bring
in the topic naming constraint which causes it to behave like
<subjectIdentity>. So, we make no reference to identity in that section of
the specification, yet the semantics of equivalence for two <baseName>s is
the same as that for equivalence of two <resourceRef> elements in
<subjectIdentity>

> > 2. For the creator of a topic map, the topic naming constraint
> > requires apriori knowledge of the vocabulary of all topic maps with
> > which the topic map being created is used.  In a certain situations,
> > this is possible - e.g.  the controlled vocabularies used by
> > technical documentation departments; the controlled set of medical
> > terms defined by WHO. However, in the general 'use-on-the-web'
> > scenario, controlled vocabularies are not likely to prove practical
> > and the topic naming constraint in effect restricts the author's
> > freedom to name topics as he/she sees fit.
>
> This would be true, were it not for scope and the ability to adjust
> the scope of "included" topic maps (using addthms or mergeMap).  With
> these, the creator of a topic map need only ensure that basenames are
> used consistently within that topic map; it is the responsibility of
> the creators of topic maps that use that topic map to scope the
> included topic map relative to their own topic maps such that
> inappropriate mergings do not occur.  Remember, it is always easy to
> scope an "included" topic map such that it effectively has its own
> namespace, i.e., so that none of the basenames assigned within it will
> ever conflict with any assigned outside of it.  The nice thing about
> it is that in the cases where relatively controlled vocabularies are
> used, the care that went into creating and goes into maintaining them
> can be rewarded by the ease of merging and the increased
> interoperability of topic maps that use those vocabularies.
>

And the nasty thing is that it doesn't work in an environment where such
care cannot be guarunteed. Even the NewsML folks (and I would expect them
to be pretty picky about accurracy) use a mechanism more
robust than scoped names for identifying subjects.


> > 3. The topic naming constraint requires that a user has access to
> > the content of potential merged topic maps plus specialised topic
> > map processing in order to determine if the creation or modification
> > of a topic will cause a merge to take place.  With subject-based
> > merging, standard string manipulation tools will work if the user
> > has access to the potential merged topic maps and using
> > authoritative subject identities could even mitigate the need for
> > access to the potential merged topic maps.
>
> See my answer to point number (2) with respect to the perceived need
> to have access to topic maps with which this map may potentially be
> merged.
>
> As far as the use of "standard string manipulation tools" versus
> "specialized topic map processing", I don't understand your point.  It
> seems to me that in order to use "standard string manipulation tools"
> to do even simple things with topic maps, you must use them in very
> specific and relatively complex ways--specific and complex to the
> point of needing to be scripted.  By the time you've done that, you've
> created a "specialized topic map processing" tool.
>

*I* agree with you on this point. But I am keenly aware that others may not.
There have
been decisions which turned on the ability for creators of topic maps to use
non-specialist tools. And whenever I suggested that "a tool could help do
that" I get
withering looks and comments about 'a vendor would say that...'. However, I
stand completely
by the point that such a specialised tool would potentially need to deal
with a huge amount of
input to be able to guaruntee the user that they are creating a topic which
will merge only
with the other topics in the world that they want it to merge with.

> Along those lines, one problem with using only subject-based merging
> is that mergings of topics can then be done in only three ways:
>
>   1) one-by-one by human specialists,

Which works in the relatively small, tightly controlled environments which
may also suit name-based merging.

>
>   2) via identification with published subjects, or
>

Which works in a distributed environment using centralised authoritative
subject
registries and which scales to allow anyone to create a registry.
Not only that, but multiple subject indicators enables third parties to
create
mappings between the subjects published by those authoritative registries.

>   3) automatically, or more likely semi-automatically, using AI tools
>
> (1) is basically the state of index merging today, though it will
> likely become both easier and more difficult given the potential
> richness of the information provided by topic maps in contrast to that
> provided by traditional indexes.
>
Agreed - (1) does not scale but is a good way of creating a high-quality
topic map from
multiple TM sources.

> (2) is a good mechanism for re-using the results of (1), but is
> limited by the need for centralized authority.  In practice, there
> will be more than one such authority, and those authorities and
> authority-to-authority mapping services will have to use (1) or (3) in
> order to provide interoperability.

That would be their value-add.

> Also, (2) requires topic subjects
> to be registered with one of these authorities and published before it
> can be used to merge topics with those subjects.  This potentially
> includes the subjects of _all_ topics: relying too heavily on (2) will
> send the number of subjects that must be so registered sky-rocketing,
> and will exceed the abilities of the authorities to collect and
> publish them.

How many web-pages are indexed by Google today ? The physical scale of such
indexes is not
a problem for those with even a relatively modest amount of hardware. The
Open Directory
Project has proven that with the correct framework, a distributed
human-powered cataloging effort can
achieve huge scale too. Factor in the kinds of standard vocabularies that
already exist and
you will be able to achieve VLSTM (very large scale topic maps... :-)

>
> (3) requires even more specialized and complex tools than the
> "specialized topic map processing" you envision above.
>

I hadn't even considered (3) - guess thats why you are the visionary :-)

> What basenames (with the topic naming constraint) provide is a fourth
> option:
>
>   4) semi-automatically using (standard, simple) set-manipulation
>      tools
>
> (4) builds on the strengths of (2), by using published subjects to
> provide a foundation that grounds the identities of other topics,
> minimizing the need for one-by-one pinpointing of subjects by humans.
> (2) thus becomes much more manageable, since authorities need only
> collect and publish the foundational subjects.
>

My argument is precisely that (4) does not do that. At best an author makes
a 'guess'
at what establishes a namespace within which his/her topic only gets merged
with other topics
on the same subject. If I scope my Hamlet as (play) and someone else scopes
their Hamlet as (Shakespearean Tragedy)
we fail to merge. If we both use the URI
http://www.shakespeare.org/plays.xtm#hamlet we merge.

> > 4. Translation becomes difficult. Not all translations between
> > languages are one-to-one, two concepts with distinct names which are
> > considered distinct in one language may be translated to a single
> > name in another language. So a translation from one language to
> > another may potential cause topic merging not intended by the
> > creator of the source topic map.
>
> It is appropriate to use language to scope basenames, thus effectively
> giving each language its own namespace.  Again, however, language
> scope need not be applied to a topic map document unless it already
> contains basenames in more than one language, or until it is
> "included" by another topic map document that contains basenames in
> more than one language.

Thats not my point, my point is that certain sets of words in one language
may have the same translation in another language.
e.g. the Inuit for "snow"

>
> > 5. Reification becomes problematic. It would be impossible for two
> > reified topic map objects to share the same scoped name. For example
> > I may wish to reify an occurrence of a topic which represents 'John
> > Smith' and give it a name 'Photograph of John Smith'. This means
> > that for any other topic about any other John Smith, I must be sure
> > not to use the string 'Photograph of John Smith' to name an
> > occurrence or else the a-nodes representing the *occurrences* (not
> > the topics!) will be merged.
>
> First, any two topic map objects may share a name in the same scope,
> just not a "basename" name.  (See my discussion of point (1) above.)
>
> Second, if you really do want to use "basename" (because you intend to
> establishing name-based identities for the topics), then it is (or
> should be) a reportable error to apply the same name within the same
> scope to topics with different addressable subjects, or to two topics
> whose subjects are known to be different (e.g., two association topics
> with different members, where the members are known to be different
> because they have different addressable subjects).  These sorts of
> errors can tell topic map creators that either different names should
> be chosen, or more likely that the scopes within which the name is
> being assigned to the topics should be differentiated.
>

So the position is to be promiscuous with merging and try and figure things
out when
they go wrong. In a controlled situation, thats fine - in a distributed
application where I want
to gather subsets of very large topic maps onto a small workstation and use
them for the
task of navigating a semantic web, promiscuous merging would be a pain to
have to
disambiguate manually and too dangerous to allow automatically.

> > For these reasons I propose the removal of the topic naming
> > constraint from the XTM 1.0 processing model and urge the authoring
> > group participating members to seriously consider and openly discuss
> > this proposal.
>
> I agree that this point needs to be discussed, though (in case you
> couldn't tell from my comments) I don't agree with the proposal.
> There is far too much confusion on these points, and it is important
> that the members (participating and otherwise) of the AG be clear and
> of one mind about them.
>

Absolutely, and thanks for taking the time to respond.

Cheers,

Kal


To Post a message, send it to:   xtm-wg@eGroups.com

To Unsubscribe, send a blank message to: xtm-wg-unsubscribe@eGroups.com
References:
- Re: [xtm-wg] Topic Naming Constraint
  - From: Peter Newcomb <peter@techno.com>