topicmaps-comment message

Subject: Re: [xtm-wg] The case against the TNC
From: "Steven R. Newcomb" <srn@coolheads.com>
To: xtm-wg@yahoogroups.com
Date: Sun, 25 Feb 2001 02:58:44 -0600
[Lars Marius Garshol:]
> Now that we suddenly have a debate on the TNC I think we should try to
> use that debate productively, and to that end I would like to start
> afresh in a more systematic way.
> 
> First of all, I believe I have a pretty good view of the rationale for
> including the topic naming constraint. I fully understand and accept
> that the purpose of the TNC is to ensure that base names (together
> with their scopes) uniquely identify their topics. I also accept that
> this is intended to support addressing, subject identification,
> searching and merging based on topic names.
> 
> My problem with this is, to put it very briefly, that I don't believe
> that it really achieves anything much.

I'm flabbergasted by this assertion.  You don't think it's useful for
topics to be addressable by their names?  What would a feature have to
do in order to "achieve anything much" in your eyes?

> On the other hand it is clear
> that the TNC represents a significant inconvenience to topic map
> authors and especially to developers.

I think you're onto something, here.  If we want to get serious and
systematic about this issue, we have to weigh the potential cost of
the topic naming constraint to developers and authors, versus the cost
of *not* having the topic naming constraint to the global community of
providers, federators, and users of knowledge.

The cost of making all topics forever unaddressable by their name
characteristics is astronomically high.  It is so high that I find it
difficult to imagine how any inconvenience to authors or developers
(see below) is even remotely comparable.  Without the topic naming
constraint, the topic map paradigm would have no inherent
topic-addressing capability.  You would not be able to address topics
in purely topic-map terms.  Without the topic naming constraint, topic
map information assets would be locked forever into whatever system
environments are required to support the addresses of the subject
identity points of their topics.  Some may say, "Well, that's fine
with me.  The Web, and its URIs, will last forever."  But that's
nonsense.  URIs are notoriously unstable.  If we do away with the
topic naming constraint, we must accept the idea that references to
topics that are made via their identity points will, sooner or later,
lose their referential integrity.  In other words, individual topic
maps will lose their value.  Let's also face the fact that the Web
itself is going to be replaced completely, sooner or later.  Shall we
abandon all our existing finding information when that day comes?
What will it cost us to save it?  Let's be honest, here, about
comparing costs.

Now let's consider the "inconvenience to developers" issue.  The
entire SGML family of standards (SGML, DSSSL, HyTime, and Topic Maps)
is designed to serve the interests of information owners and users,
and all of the business models of information owners and users.  Not
one of these standards is designed to serve the business models,
needs, or convenience of software developers.  On the contrary, they
all break down the ghetto walls that software vendors have erected,
and continue to erect, around the information that belongs to their
customers.  Those ghetto walls are injurious to the interests of the
information owners whose information is trapped within them.

The topic naming constraint is a ghetto-wall breaker that is vital to
the interests of topic map owners.  If you say it represents an
inconvenience for software developers and software vendors, that's not
surprising.  Actually, it's comforting.  The topic naming constraint
is a ghetto-wall breaker because it's a feature that can be used by
those who invest in the creation and upkeep of topic map information
to protect their investments from losses due to changes in the host
technologies.  (One such "host technology" -- and only one -- is the
World Wide Web.  Let's not forget that the Web is emphatically *not*
designed to manage or preserve information.  It is designed to
*distribute* information.  As an information management paradigm, the
Web is profoundly deficient.  Maybe the Semantic Web initiative will
ultimately make the Web a useful platform for information management,
but we shouldn't wager all of the investments that will be made in the
creation of topic maps on the possibility of that outcome, no matter
how devoutly we may desire it.  At best, it's something that may
happen at some future date, maybe.  (To that end, we're all hoping
that the "addressable subject", "subject indicator", and "subject
identity point" ideas of the topic maps paradigm will help focus
attention on some of the relevant requirements, for example.  But
that's another story.  Right now, we need topic maps to work in the
real world, as it is today, and on the Web, as it is today.  "Working
in the real world" means making investments in topic maps both safe
and rewarding, regardless of the Web's deficiencies.)

As for the "inconvenience to authors" that you report, it has not been
demonstrated.  The complaints we've seen here indicate that these
presumably inconvenienced authors are actually inconvenienced only
because their requirements are consistent with the use of labels, but
they somehow have erroneously got the idea that they want to be using
names.  What these authors need is education.  They won't be helped by
crippling the topic maps paradigm.

Lars Marius, if what you're really proposing is that developers should
not be embarrassed to emphasize the use of topic labels over topic
names, in order to avoid inconveniencing the users of their software
products, then I agree with you.  But I hope that authors will also be
told 

* that when a topic doesn't have a name, it can't be addressed by its
  name, and

* that when a topic can't be addressed by its name, there are certain
  consequences of which the topic map owner should be aware.

> Furthermore, the TNC seems to me
> to force authors into making statements that make no sense from an
> information modeling point of view.

Demonstrations, please.

> In short, the TNC brings a lot of
> pain and little gain that I can see.

Demonstrations of pain for authors and users, please.  

Also please demonstrate the "littleness" of the gain for users and
federators.  It seems to me that the gain is nothing less than:

* protection from total loss of the value of the topic map due to
  technology changes beyond the owner's control,

* the ability to exploit the value of the topic map in "foreign"
  system environments and in unforeseen ways, and

* the ability for the paradigm to support the business models of
  knowledge federators.  Today, there are very few operators in that
  business (Lexis-Nexis comes to mind as an example), but the
  potential size of that business space is mind-bogglingly larger and
  far more diverse than it is today.

> What I have a problem with is not the TNC as heuristic or guiding
> principle, but the TNC as an absolute rule to be applied at all times.

The topic naming constraint is not intended to be a heuristic for
assisting in the process of merging topic maps.  (I keep reading this
idea that it's really a heuristic, and people keep saying so, but it's
simply not true!  It's a conclusion that intelligent people seem to
intuitively leap to.  And I can see why they do it.  But it's not a
*valid* conclusion; it's merely an *attractive* one.  And it leads to
horrendous further mis-conclusions, like the idea that the topic
naming constraint is a bad idea that we should do away with.)  The
topic naming constraint is a rigorous naming discipline whose purpose
is to make topics unambiguously addressable by name.  The discipline
imposed by the topic naming constraint must be observed by all topic
map authors, no matter what.  If the topic naming constraint is not
observed by a topic map author, the topic map he creates cannot be
processed accurately according to the ISO or XTM standards;
information interchange based on those standards will fail, and it
will be easy to determine whom to blame for the failure.

> Let's start with why I don't think the TNC really achieves anything
> much, or indeed even works at all. Addressing topics by name seems
> pointless to me when we have URIs that can refer to them. Ditto for
> subject identification, which we can do via subject indicators.

Rightly or wrongly, I infer from this statement that you believe:

(1) that the value of a topic map is entirely in its XML
    representation.

(2) that references to topics by means of URIs is perfectly adequate
    for the support of whatever information management techniques need
    to be applied by the owners of topic maps, now and forever.

(3) that the Web is a perfectly adequate platform for the development
    and conservation of expensive, high-maintenance knowledge assets,
    such as topic maps.

(4) that subject indicators, if they exist on the Web, are always
    going to stay where they are, at least as long as people will
    attempt to use the topic maps that point to them.

I don't agree with any of the above statements.

> As for searching I really don't see how the TNC gives us any help.
> Let's take the old Paris example. If I want to find Paris, Texas there
> is no need for me to search for "the topic with the base name Paris in
> the scope Texas" when I can search for "the topic with the base name
> Paris that has a contained-in association with Texas".

Let me show you how the topic naming constraint can be invaluable in
helping people use topic maps, using your own example.

Computer: Where do you want to go today?

    User: I want to go to Paris.

Computer: I know about two Parises.  Do you mean Paris in the scope of
          France, or Paris in the scope of Texas?

The point is that the computer knows how to solicit exactly the right
information to get to the right Paris with the minimum effort on the
part of the user.  How did it know that?  Because it knew that there
are two namespaces within which "Paris" appears, so the only question
that needed to be answered was, "Which namespace has the Paris that is
desired?"  In a well-made topic map, the scoping-topic differences
between the namespaces are exactly the questions whose answers will
disambiguate the situation fastest and most accurately.

There is a basic idea here that I feel it necessary to emphasize: the
purpose of a topic map is TO ENHANCE THE PRODUCTIVITY OF ITS *USERS*.

Let's contrast this idea with some other ideas that we've been
seeing in these notes that disparage the topic naming constraint:

* Is it the purpose of a topic map to enhance the productivity of the
  developer of topic map software?  NO.

* Is it the purpose of a topic map to enhance the productivity of the
  author of that topic map?  NO, at least not directly.  YES, if we
  consider the topic map author's productivity in terms of the
  aggregate productivity enhancement that the topic map will bring to
  all of its users.  Therefore, if writing a topic map is a pain in
  the ass for its author, but it saves 1,000 times as much pain for
  its users, does the topic map enhance the productivity of the
  author?  YES!  But is it going to be *convenient* for the author to
  write the topic map, or is it going to be a pain in the ass?  IT'S
  GOING TO BE A PAIN IN THE ASS, AND THAT'S THE VERY REASON WHY THE
  TOPIC MAP HAS VALUE.  The users will pay for the privilege of
  avoiding that very pain in the ass.

> A further problem is that to use scope as a basis for anything I need
> to guess what the author used as scope, which is by no means easy.

No, you don't.

> Paris can quite legitimately be scoped by Texas, the US and North
> America. If we try to find Paris the hero things get even worse, since
> the scope may well be Greece and legend, Greece and history, Greece,
> legend, history, Greece and mythology, Troja and so on and so forth.
> Scope is, in short, too vague.

You have it backwards.  Users don't have to guess.  They can be
prompted, and the prompts can and should depend on the scopes of
the names.  That is the *purpose* of those scopes!

Computer: Ready.

    User: glug... uff... Paris... mmmph (*burp*)

Computer: Did you say, "Paris"?

    User: Yeah.  (*sounds of coughing and spitting*)

Computer: Are you interested in a city or a legendary, possibly
          historical hero of ancient Greece?

    User: City.

Computer: I know about two such Parises.  Do you mean Paris in the
          scope of France, or Paris in the scope of Texas?


> In order to support merging it may well be that scoping of names has
> some value, but it is still subject to the problem outlined in
> previous paragraph. If it is only intended to support merging then the
> TNC is better used as a heuristic for merging tools than as a
> draconian law that all topic maps at all times must follow.

I think I've already dealt with these points.

> It should also be obvious that the topic naming constraint represents
> a significant inconvenience to topic map authors. My example of the
> XML query language should serve to illustrate this. When creating the
> topic representing the second of these the author will presumably be
> confronted with a message saying that her new topic seems to share its
> subject with another topic.

I think maybe you're thinking that the topic naming constraint is
somehow required to be dealt with in a particular way during the
creation of a topic map.  NO TOPIC MAP STANDARD SAYS ANYTHING ABOUT
HOW TOPIC MAPS SHOULD BE CREATED.  ISO 13250 and XTM *only* strive to
specify what a topic map document, *after* it has been created, will
be understood to mean by a conforming processor.  There is no
"presumption" whatsoever about confronting authors with messages.

I'll let the nonsensicality of the message you propose, "your new
topic seems to share its subject with another topic", go by without
comment.

> The author will then be forced to scope
> the name in order to avoid this unwanted merge, regardless of whether
> the author wants to or not.

That's the whole point of the topic naming constraint.  Yes, if the
author wants the topic to be findable by means of its name, the author
must say how a user can most easily and productivity-enhancingly
distinguish between the two uses of the name.  Is it costly for the
author?  You bet it is.  That's the nature of the investment in making
a topic map.  Does it make the two different topics that happen to
have the same name reliably and quickly findable by a user of the
topic map?  Well, if the author takes the trouble to think about the
question of how to differently scope the two names in terms of how
best to enhance the productivity of users of the topic map, YES,
EMPHATICALLY YES.  

If the authoring software uses some brain-dead trick to give the names
different scopes no matter what, such as including the named topic in the
scope of the name of that same topic, the productivity of users is not
enhanced at all.  The topic map authors who use such software have
been cheated out of their opportunity to enhance the productivity of
their customers.

> There are many situations where the author will not get such a
> warning, such as when editing a topic map as an XML document

Huh?  Nobody is going to directly edit a topic map as an XML document,
just as nobody is going to attempt to use a <topicMap> element by
reading it with his eyes.

> or when
> merging in another topic map.

"Merging another topic map" is the act of creating a topic map that
contains one or more <mergeMap> elements.  NO TOPIC MAP STANDARD SAYS
ANYTHING ABOUT HOW TOPIC MAPS SHOULD BE CREATED.  When you create a
topic map, YOU are responsible for any merging that it declares.  YOU
are responsible for determining that any merging that occurs, occurs
correctly.  When you create a topic map, YOU have all the tools you
need to make sure that the correct merges willoccur, and no incorrect
merges will occur, when your topic map is processed.

> In these cases the TNC is in my view
> likely to prove a significant problem by causing unwanted merges. 

THE TOPIC NAMING CONSTRAINT DOES NOT SAY ANYTHING ABOUT AUTHORING.  IT
ONLY SAYS HOW AN ALREADY-AUTHORED TOPIC MAP WILL ALWAYS BE
INTERPRETED.

> It is of course a further problem that the best choice of scope that
> anyone has been able to think of so far is to use the topic itself as
> the scoping topic, because this completely subverts all the intended
> uses of the basename-plus-scope-as-unique-identifier. To find this
> topic by its basename-plus-scope you need to have found the topic
> already before you start looking, which is absurd.

Right.  It's a dumb idea.  

> A part of the problem here is that scope-as-subject-disambiguation is
> tricky because you never have any guarantee that you have
> disambiguated sufficiently to avoid unwanted merges.

That's right.  The topic map author is always responsible for creating
a sensible topic map.  There are no guarantees that the process of
authoring a topic map will be easy or convenient.

> When the two
> mr. John Smiths live in different cities scoping by city is enough,
> but when John Smith I moves to the same city as John Smith II it no
> longer suffices. Should John Smith II then move into the same street
> as John Smith I the scope needs to be adjusted again, and there is no
> guarantee that this process will ever end, unless you scope both by
> themselves.

In a changing world, topic maps have to change if they are going to
stay useful.  There are no guarantees that the process of maintaining
a topic map will be easy or convenient.

> When it comes to implementation of topic maps it is clear that the
> TNC presents a significant obstacle to the developer in situations
> where the topic map is dynamic, for example because it is really a
> view of some other data source. If that other data source is huge
> constantly traversing all of it in order to find all the base names of
> all topics in the mapped view so that the TNC can be applied is most
> likely not going to be feasible at all.

The problem you seem to be referring to is the general problem of
hyperlink management in a dynamic corpus.  The relationship of what
you are saying to the topic naming constraint is not clear.  Are you
envisioning a scenario in which a topic map is being authored in such
a way that the basenames of the topics are emanating from some dynamic
corpus?  If that is the case, then, yes, there is a maintenance
challenge, here.  Value may need to be added by a human being -- a
real author -- whenever a name clash occurs in a new scan of the
database.  But that seems obvious and necessary if the extracted names
are to have any value to the users of the resulting topic map, no?

> Implementing the TNC in more static situations (such as when reading
> an XTM document or permanently merging two topic maps) is also
> difficult, but manageable. 

Yes.  It takes work to make a good topic map.

> However, as I believe Geir Ove and Kal have pointed out several times
> already, the presence of the TNC requires total knowledge of the topic
> map in order to avoid the possibility of unwanted merges.   This is
> generally problematic in all kinds of situations where software is
> creating topic maps automatically. I expect applications that use
> NewsML feeds to automatically build topic maps to be bitten by this
> quite often, for example.

Yes.  It takes work to make a good topic map.  Computers can't do
all that work.  Topic maps provide a way for humans who are users
of topic maps to leverage the work already done by authors of topic
maps.  Topic maps do not provide a way for computers to do that work
in the first place.  Nothing can do that.  There is no magic.

> The major difference between merging by subject indicator and merging
> by basename-plus-scope is that only the latter can cause merges that
> are not wanted. If we remove the TNC we also remove this entire
> problem at a single stroke.

If you don't give your topics any names at all, you can remove this
entire problem at a single stroke.  Use occurrences (of type "label")
for your names.  You'll be very pleased with the results: not one
unintended merge will occur on account of the topic naming constraint.

If you remove the topic naming constraint from the topic map paradigm,
you can remove nearly all the usefulness of the topic map paradigm, at
a single stroke.

> I am hung over today and not able to express myself as clearly as I
> would wish, but this is my best effort at explaining why it seems to
> me that the TNC is awkward, unnatural and achieves very little of
> value. I am open to the possibility that the last two of these three
> statements are untrue, but for arguments to be effective they must
> show convincingly how the practical benefits of the TNC can be
> realised and how natural scopes for base names can be found. So far
> the proponents of the TNC have not provided this.

I hope I've succeed in providing what you're looking for with the
above rants, or at least in helping you see another perspective.

> Furthermore, if I am proven wrong it is clear that topic map practice
> needs to be shaped by what the TNC is meant to achieve, and that most
> people find it difficult to practice this. In other words, if we are
> to accept the TNC we must give scope an entirely new prominence in
> topic map schemas and applications and start using it much more
> actively.

I think what you're saying is that you and your colleagues will have
to think in a new way about these things.  I hope you will find
what I've been trying to communicate to be liberating and helpful.
I'm *sure* it's also very challenging for you; you guys are pioneering
here and I am among your team's admirers.

> It also seems clear that many people are unhappy with the TNC and the
> strict regime it forces on them, and so from a 'marketing' point of
> view it seems that there is definite value in setting down the
> arugments for it in a clear way to minimize this problem, and also to
> ensure that people actually use scope in the intended way.

With regard to your situation:

* Topic maps can be *hard* to author, but can be *wonderfully
  productivity-enhancing* to use.

* As a vendor of topic map authoring tools, you need to convince your
  customers that

    - topic maps can be great investments that pay big dividends, and

    - your tools maximize the value of those investments, and minimize
      their cost.

* Marketing is a funny problem.  Often, the best marketing strategy is
  the least exciting technical strategy.  Consider the idea of
  implementing only selected parts of the topic map paradigm in your
  authoring tools.  Your delivery tools *must* implement the paradigm
  fully, in order to claim conformance, but your *authoring* tools
  don't have to do that.  Indeed, educating your customers to the full
  potential of the paradigm is impossible anyway.  Just take one step
  at a time, and try to make each step profitable.  You seem to feel
  strongly that topic names are too hard for authors to support.  Use
  that insight, if you believe in it.  You may be right!  So: consider
  not supporting names in your authoring tool!  Let your customers
  build topic maps that don't have any topic names, but instead use
  labels that appear as occurrences in the <topicMap> elements.  Your
  tool will still be producing conforming topic maps, and your insight
  may turn out to have been the key to opening the marketplace.  Think
  outside the box!

* Good luck.  We're all hoping for success, and the success of any of
  us presages the success of all of us.  In other words, we're all in
  this together, like it or not.  If one of us gets the marketing
  thing right, the others are sure to follow.  Experiment!  The
  earliest bird may seize the biggest worm.

-Steve

--
Steven R. Newcomb, Consultant
srn@coolheads.com

voice: +1 972 359 8160
fax:   +1 972 359 0270

405 Flagler Court
Allen, Texas 75013-2821 USA

------------------------ Yahoo! Groups Sponsor ---------------------~-~>
eGroups is now Yahoo! Groups
Click here for more details
http://us.click.yahoo.com/kWP7PD/pYNCAA/4ihDAA/2n6YlB/TM
---------------------------------------------------------------------_->

To Post a message, send it to:   xtm-wg@eGroups.com

To Unsubscribe, send a blank message to: xtm-wg-unsubscribe@eGroups.com 

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
Follow-Ups:
- Re: [xtm-wg] The case against the TNC
  - From: Geir Ove Gr�nmo <grove@ontopia.net>
References:
- [xtm-wg] The case against the TNC
  - From: Lars Marius Garshol <larsga@garshol.priv.no>