topicmaps-comment message

Subject: RE: [xtm-wg] Topic Naming Constraint
From: "Kal Ahmed" <kal@ontopia.net>
To: <xtm-wg@egroups.com>
Date: Mon, 15 Jan 2001 22:27:29 -0000
Steve Newcomb wrote:
> [Kal:]
> > I would like to express my concerns about and
> > objection to the Topic Naming Constraint expressed in
> > the XTM 1.0 specification. Having worked with both
> > the ISO 13250 specification and XTM 1.0 specification
> > and implemented programming libraries for both of
> > these specifications, I find the topic naming
> > constraint to be an unecessary restriction which
> > makes the creation of consistent, mergeable topic
> > maps exceedingly difficult in any but the most
> > restricted situations. My objections are four-fold
> > and I will attempt to express them here.
>
> > 1. In my mind, the most important objection is that NAME
> > and IDENTITY are two orthogonal concepts. There is no
> > way in which a name should be construed as asserting
> > identity.
>
> I disagree.  Either names are meaningful, or they
> aren't.  If names have nothing to do with identity, why
> am I never called "Artur Rubinstein" or, for that
> matter, "Species 8472"?  My point is that names have a
> very great deal to do with identity, and identity has a
> very great deal to do with topic merging.  In
> informatics, the normal and primary purpose of naming a
> thing is to allow it to be identified -- i.e.,
> addressed -- unambiguously and reliably.
>

But the names that people assign to subjects is neither unambiguous nor is
it reliable.
There is a method of unambiguous and reliable identification of subject, it
is the
subject identity mechanism.

> The primary design goal of topic maps was always to
> make it possible to merge finding aids for corpora of
> information.  You seem to be saying that names should
> be excluded from being used in service of that goal,
> because the discipline that the topic maps paradigm
> imposes on the use of names (the topic naming
> constraint) is not worth the trouble it causes.  I
> disagree with that assessment.

No I am not disagreeing that names are important to humans. I am asserting
that names are unreliable and confusing to computers, which limits the
ability
of name-based merging to be of assistance as a finding aid.

>  The value of being able
> to merge finding aids can scarcely be overstated; if we
> lay aside all the hype, that value is the real,
> revolutionary importance of topic maps.  If we weaken
> the mergeability of topic maps, we beg the question,
> "What can topic maps do for me that I can't already do
> with plain-vanilla hyperlinks?"  If we take the
> position that the specification of scopes is
> unnecessary, or that it should be avoided, we have
> nothing but a plain-vanilla hyperlink architecture.
> With scope, the topic maps paradigm is *much* more
> powerful than any hyperlink architecture.
>

I am not arguing against scoped topic characteristic assignment either, nor
am I
downplaying the importance of mergeability. But the merging of two wildly
different
topics cannot be of service to a finding aid.

> I also think you're overstating the difficulties that
> the topic naming constraint imposes on authors of topic
> maps.  Yes, it creates work.  But it's useful and
> important work that goes to the heart of the reasons
> why people need the topic map paradigm.
>

It creates work for them because they have to determine a scoped basename
that will not
cause merging with things they do not want to get merged. In fact its a
little like assigning
a subject identity but with the additional problem of multiple meanings for
words.

> > Both ISO 13250 and XTM 1.0 recognise the
> > orthogonality of these two concepts by providing
> > separate constructs for each.
>
> This surprising interpretation does not jibe with the
> history of the paradigm, or with the text of either
> standard.
>

What else is the purpose/distinction of -identity- and <topname>/<basename>
et al. is ISO 13250
and <subjectIdentity>/<baseName> in XTM. The very names are something of a
give-away, aren't they.
If I have been fooled by the names, please explain to me why a baseName does
not establish the name
of a subject and and why <subjectIdentity> does not establish the identity
of a subject.

> > Unfortunately the topic
> > naming constraint then smashes the two concepts
> > together again making a scoped name into a form of
> > identity for a topic.
>
> You are exactly right when you say that a scoped name
> is a form of identity for a topic.  I would delete
> "Unfortunately", and I'd change "smashes" to "amounts
> to a careful and powerful articulation of".
>
> The ONLY reason to merge topics is that they have the
> same subject.  But if the names of subjects are
> meaningful (i.e., if names are useful for identifying
> things), then it is reasonable and appropriate to take
> advantage of that fact.
>

If we have a human language in which the unambiguous naming of the set of
all distint topics
in experience was possible, I would agree.

> First of all, what is the topic naming constraint?  It
> is that no two topics can have the same name (basename)
> in the same scope (topic namespace).  Why is that
> important?  Because, otherwise, the names of topics
> cannot be used to look them up directly; names do not
> identify topics.  The usefulness of topic names -- and
> topic maps -- would be severely compromised if topics
> could not be unambiguously addressed by their names.
> The importance of being able to address topics by their
> names -- which hasn't been done very much as yet -- can
> scarcely be overemphasized.  Kal, if the designs of
> your applications are not yet taking advantage of the
> topic naming constraint, I would urge you to think
> about the bigger application problems that would be
> insoluble without it.

I have created applications that do take advantage of the topic naming
constraint,
but it is nothing that I can't do more robustly and manageably with the
proper application of
identity. I can completely name-space my phrases in a far more robust manner
by creating a
URI/URN scheme which incorporates the name and the concept of namespacing
than I can by
attempting to second guess the other names and scopes that my topic map may
encounter.

>
> > 2. For the creator of a topic map, the topic naming
> > constraint requires apriori knowledge of the
> > vocabulary of all topic maps with which the topic map
> > being created is used. In a certain situations, this
> > is possible - e.g.  the controlled vocabularies used
> > by technical documentation departments; the
> > controlled set of medical terms defined by
> > WHO. However, in the general 'use-on-the-web'
> > scenario, controlled vocabularies are not likely to
> > prove practical and the topic naming constraint in
> > effect restricts the author's freedom to name topics
> > as he/she sees fit.
>
> Not true.  Anybody can use any name for any topic, as
> long as it's done *consistently* within the same topic
> map.  (And people who can't face the discipline of
> doing internally-consistent work are constitutionally
> incapable of making useful topic maps in any case.)
> There is a requirement that, when two different topics
> (subjects) must have the same name, the scopes within
> which they have those names must be distinct.  This
> requirement is basic; it supports the process of
> determining which topic the end user wants:
>
>   Directory assistance: "Which Mr. Smith do you want?
>                         Do you want the one on High
>                         Street, or the one who is not
>                         on High Street?"
>
> The purpose of scope is to support these distinctions.
> "Living on High Street" is a topic that either is, or
> is not, in the scope within which all Mr. Smiths have
> their names.  The idea of changing the name of each
> Mr. Smith is inconsistent with the requirements of the
> real world.  I would be very surprised to find a
> database, for example, that listed my name as "The
> Steve Newcomb who lives on Flagler Court."  The idea of
> regarding "The Steve Newcomb who lives on Flagler
> Court" as a controlled-vocabulary term is absurd, and
> it's the wrong way to think about topic names.  You
> seem to be determined to avoid the use of scope for
> making distinctions between names, but the fact is that
> if you don't use scope for your names, you are bound to
> have trouble.  Scope is fundamental to topic maps.
>

And you can repeatedly apply the topic naming constraint and watch
two High Streets collapse in on themselves, taking their respective
Mr. Smiths with them. This is just not as robust as compiling the
persons address into a URI. If I am creating a directory of people in a
company
and I have two people with the same name in the same dept. what do I do ?
What do I do if a third person with the same name joins the company.
Now what do I do if I want to share my directory with the company that just
bought my company ? The use of names and topic-based namespaces is simply
insufficient. With an identity-based scheme, I can encode the employee
number
within a namespace defined for my corporate HR system and I'm done.

> Your statement that "the topic naming constraint
> requires apriori knowledge of the vocabulary of all
> topic maps with which the topic map being created is
> used" is simply not true.  When merging multiple topic
> maps, it's trivially easy to distinguish the names
> applied by various topic map documents (more precisely,
> <topicMap> elements) from one another, so as to avoid
> name clashes across topic map documents.  Avoiding such
> clashes is the primary purpose of the scope-diddling
> feature of <mergeMap>.
>

And you diddle the scope one way to make a pair of topics merge then another
way
to prevent some other topics from merging ? Thats not even possible in XTM,
even if
it was a desirable way to work. I don't want to have to filter through a
topic map fragment
I pull of the web to indicate what I want merging and what I don't want
merging - I want
the authors to have complete control over their specification of the topic's
subject and I want them
to be free to use name and scope for providing meaningful name strings in
contexts that are
useful to applications.

> Finally, if you don't like the facilities that the
> topic maps paradigm provides for names, you certainly
> don't have to use them.  Just don't give your topics
> any names, and, instead, you can define an occurrence
> type (or an association type) for your "name-like
> things" that has whatever application-defined semantics
> you want it to have.  Such occurrences will not be
> treated as names, and they will therefore not be
> subject to the topic naming constraint.
>

I am writing this as an implementor of topic map software and a designer of
future systems based on topic map technology. Sure, I can construct some
applications that don't use <baseName>, but I'm not worried about me, I'm
worried
about users trying to get to grips with what a name really does. Overloading
it
like this does not help.

> > 3. The topic naming constraint requires that a user has
> > access to the content of potential merged topic maps
> > plus specialised topic map processing in order to
> > determine if the creation or modification of a topic
> > will cause a merge to take place.
>
> How so?  I don't know of any basis in fact for this
> statement.
>

If I want to represent a subject such as a bridge called "London Bridge",
I have to be sure that there aren't any other topics that my topic map
gets merged with which use the name-string "London Bridge" to refer to the
old London Bridge,
or to refer to the song "London Bridge (Is Falling Down)" etc.
I have a namespace to create, and I do it using themes - which are also
subject to merging and so
as unpredictable as the topic itself.
Not only that, but if I *want* a merge to occur and the subject I want to
merge with
has no identity (because I can use the topic naming constraint) then I'm in
the position of
relying on a potential changing scope containing a potentially changing name
for defining the
subject identity point I want to merge at. I would definitely prefer a
stable URI.

> > With subject-based
> > merging, standard string manipulation tools will work
> > if the user has access to the potential merged topic
> > maps and using authoritative subject identities could
> > even mitigate the need for access to the potential
> > merged topic maps.
>
> I don't know enough about your application context
> to understand what you're saying here, so I can't
> comment.
>

Consider a distributed system which enables college students to share their
music collections. By using recording catalog numbers as identity, the
merging of 000s of Greatest Hits albums does not become a problem and even
the Japanese import of "The Clash", by "The Clash" doesn't merge with the
original UK release of the same album (important if the import has extra
tracks).

> > 4. Translation becomes difficult. Not all translations
> > between languages are one-to-one, two concepts with
> > distinct names which are considered distinct in one
> > language may be translated to a single name in
> > another language. So a translation from one language
> > to another may potential cause topic merging not
> > intended by the creator of the source topic map.
>
> Yes, translation is always difficult, but the
> difficulty of translation is not caused by topic maps.
> Topic maps merely demand precise translation, in order
> to work properly.

A precise translation may still lead to two different subjects having the
same name.

> This is a good thing, not a bad
> thing.  (Unless, of course, you think it's a good thing
> for the users of translations to be misled by the
> translator's sloppiness or lack of awareness of nuances
> in the target tongue.  Personally, I don't think that
> it's a good thing when users of topic maps are misled
> by them.  I think it's just great if topic map authors
> are encouraged to be precise about what they say.  The
> more precise, the better.)

Yes, and identity is the way to do it. Not all situations are tightly
controlled.
Alot of the preceeding arguments you make would absolutely hold water in a
controlled situation such as a single TM author; or even a group using a
controlled vocabulary (and with tame translators - a rarity in my experience
:-).
But I really can't see it working in a highly distributed massively
interconnected
environment such as the Web.

>
> > 5. Reification becomes problematic. It would be
> > impossible for two reified topic map objects to share
> > the same scoped name. For example I may wish to reify
> > an occurrence of a topic which represents 'John
> > Smith' and give it a name 'Photograph of John
> > Smith'. This means that for any other topic about any
> > other John Smith, I must be sure not to use the
> > string 'Photograph of John Smith' to name an
> > occurrence or else the a-nodes representing the
> > *occurrences* (not the topics!) will be merged.
>
> Not true.  It's very easy to avoid this name clash, in
> the usual way, using the normal topic map scoping
> facility: simply include the topic whose subject is the
> appropriate John Smith in the scope of the name of the
> reified occurrence.  Then, that occurrence cannot be
> confused with a photograph of any other John Smith,
> even if two photographic occurrences of two different
> John Smiths have the name 'Photograph of John Smith'.
>

And if I merge with another topic regarding the same John Smith (by what
ever means)
and it also has a photo, also called "Photograph of John Smith" ?

> > For these reasons I propose the removal of the topic
> > naming constraint from the XTM 1.0 processing model
> > and urge the authoring group participating members to
> > seriously consider and openly discuss this proposal.
>
> We shall certainly discuss it, but I'm taking this
> opportunity to share my opinion that this is a very bad
> idea.  If adopted, this proposal will seriously weaken
> the topic maps paradigm, and it will do so for no good
> reason.  I say again: if you don't like the topic
> naming constraint, which is the foundation of the
> paradigm's topic naming facilities, then don't use the
> topic naming facilities.  In other words, if you don't
> want your names to be regarded as being useful for
> identifying the topics of which they, uniquely, are the
> names, then use the occurrence facilities or the
> association facilities to express your topic names.
> That's a perfectly valid thing to do, and it will leave
> everyone who, unlike you, wants to take advantage of
> the enormous identifying and merging power provided by
> the topic naming constraint, in a position to continue
> to do so.

The exact reflection of that would be let those who want to use
scoped name to define identity do so with a URI scheme which
encodes the scoped name and let me call my topics what I like without
having to worry about unwanted merging.


Cheers,

Kal


To Post a message, send it to:   xtm-wg@eGroups.com

To Unsubscribe, send a blank message to: xtm-wg-unsubscribe@eGroups.com
References:
- Re: [xtm-wg] Topic Naming Constraint
  - From: "Steven R. Newcomb" <srn@coolheads.com>