topicmaps-comment message

Subject: Re: [topicmaps-comment] Can subjectIdentity elements guarantee topicidentity?
From: Kal Ahmed <kal@techquila.com>
To: topicmaps-comment@lists.oasis-open.org
Date: Wed, 17 Oct 2001 14:47:16 +0100
At 18:47 16/10/2001 +0000, Ivan Uemlianin <ivan@jurakm.com>:
>Can subjectIdentity elements guarantee topic identity?
>
>I'd like to explore this issue, so I've framed the argument below assertively.
>
>The fact that two topics have the same subjectIdentity elements does not 
>independently guarantee that the two topics are about (denote/refer to) 
>the same thing (idea/notion/subject/whatever).  However, the topic map 
>specs (i.e. iso 13250 and xtm 1.0) both stipulate that topics with 
>identical subjectIdentity attributes/elements should be merged 
>automatically (iso 13250 5.2.1; xtm 1.0 2.2.1.3 & 2.2.1.4).  This 
>stipulation is not motivated by the syntax, by likely use or by broader 
>theoretic consideration.  The stipulation is therefore an arbitrary 
>barrier to topic map development, and it should be relaxed or it will be 
>ignored.

IMHO, the stipulation is precisely motivated by an intended mode of use of 
subjectIdentity, precisely for unambiguously defining a subject.

>Here is a plausible example of use to demonstrate this: two topics with 
>identical subjectIdentity/resourceRef elements.
>
>Here are excerpts from two XTMs developed independently at separate 
>locations (the XTMs are fictional but the jpgs are real).
>
><topicMap id="topicMap1">
><topic id="abc123">
><subjectIdentity>
><resourceRef xlink:href="http://bla.jpg"/>
></subjectIdentity>
><!-- names and occurrences here -->
></topic>
><!-- more topics and associations -->
></topicMap>
>
>
><topicMap id="topicMap2">
><topic id="xyz789">
><subjectIdentity>
><resourceRef xlink:href="http://bla.jpg"/>
></subjectIdentity>
><!-- names and occurrences here -->
></topic>
><!-- more topics and associations -->
></topicMap>
>
>The .jpg file that both topics reify is an image of the Earth with some 
>clocks floating about in space.  TopicMap1 is all about the artist who 
>created this image, and the topic is associated with the artist, their 
>other works and so on.  The developer of topicMap1 is unaware that bla.jpg 
>contains an encrypted image of the B52 graveyard in Arizona. TopicMap2 is 
>all about encryption and steganography and the topic is associated with 
>the encrypted image, the news story around it (href?)and so on, with no 
>mention of the artist and their other works.

I think you may have misunderstood what subjectIdentity is intended to 
represent. From XTM 1.0 (section 2.2.1.3):

Subject identity can be established in one of two ways:
   1. By addressing the subject directly. This is only possible when the
       subject is an addressable information resource.
   2. By indicating the subject via a subject indicator (see below).

<subjectIdentity> does the former and <subjectIndicator> does the latter. 
In your example, <subjectIndicator> should be used as neither of the topics 
regard http://bla.jpg as their subject (i.e. neither of the topics are 
about *that file*.

However, having said that - even if you change the syntax, that does not 
substantially alter the validity of the following argument. So I am 
responding with the assumption that topicMap 1 and 2 contain 
<subjectIndicator> elements which point to the same resource as an 
indicator of their topics. Now, note that XTM 1.0 section 2.2.1.4 says:

A subject indicator is a resource that is
intended by the topic map author to provide a positive, unambiguous
indication of the identity of a subject. When
two topics use the same resource to indicate their subject, they are by
definition "about" the same thing, and must therefore be
merged during processing.

So when you say:

>These topics *could* be merged into a more 'complete' reification of the 
>subject, but the developers of topicMaps 1 & 2 could plausibly argue that 
>the merged topic is a third topic, different from the other two, and does 
>not replace them.  There would now be three reifications of the same 
>subject which related in a particular way.  In other words merging is a 
>kind of association rather than a copy-and-delete.

You are sort of right ;-)

 From the point of view of the system which processed topicMap 1 and 2, 
there is now only one reification of a subject which is indicated by 
http://bla.jpg

Now, note that the authors of topicMaps 1 and 2 have not followed the 
advice of XTM 1.0 2.2.1.4 para 2 which says:

Since subject identity forms the basis for merging topic maps and
interchanging semantics, authors are encouraged to always indicate the
subject identity of their topics in the most robust manner possible, in
particular through the use of standardized ontologies expressed as
published subject indicators.

So the fact that they used an ad-hoc resource as a subjectIndicator is the 
root the problem. As a piece of processing software, receiving topicMaps 1 
and 2, how do I know whether or not the subjectIdentity resources in the 
topic maps I receive are robust ? The answer is that I cannot - I can only 
present the merged information to the user, or otherwise attempt to make 
use of it the best I can.


>The point about reification is that a topic will 'selectively reify' or 
>'reify and interpret', in other words a topic map developer will not 
>necessarily reify a subject in its objective entirety (this may be 
>impossible in principle), but only the aspects of the subject which 
>they're interested in or aware of.

Yes, which is why applying scope to merged topic maps is a good idea. That 
way I can always go back and say "Microsoft says XYZ about the subject 
indicated by http://www.microsoft.com/xp and the Microsoft Defamation 
League say ABC about the subject indicated by http://www.microsoft.com/xp" 
- on closer examination, a human being might also be able to say "...but 
they appear to be talking about the same thing". Without scoping - or some 
other application specific trick to keep the source information separate, 
it might appear that XP is both a Good Thing and a Bad Thing.

Of course, in your example, these are not different opinions about the same 
subject, but different subjects being indicated by the same resource. 
Handling this is more tricky. Lets assume we can't change XTM 1.0 however 
much we would like to ;-). Our options are:

1) Remain compliant and accept the problem
2) Allow pre-processing prior to compliant merging or use non-compliant merging
3) Do not irreversibly merge the topic maps

(2) might be best applied in an automated system. For example, the system 
could simply drop all <subjectIndicator> elements which contain references 
which are not from one of its accepted ontologies - or refuse to 
acknowledge them during a merge process.

(3) might be better employed in a system where a human acts as final 
arbiter (e.g. a topic map browsing application of some kind). If each 
"merged" topic could be expanded to show what the source topics were, a 
human being could then take a final decision about whether or not the 
topics really are about the same thing.

>This obtains no matter how detailed or complete is the resource being 
>reified: different developers can always reify differently.  In fact, the 
>richer the resource, the more the reifications can diverge - even to the 
>extent of being mutually exclusive.  I am not being a postmodernist: 
>imagine reifications of the subject 'Osama bin Laden', one at the 
>Pentagon, the other at www.alQaeda.net.  The heterogeneity of knowledge is 
>not a technical problem; knowledge is 'human, all too human'.

This is a non-example. Lets assume both Microsoft and the MSADL use 
Microsoft's ticker symbol (suitably URI'd) as a subject indicator (not as 
the subject, as they are both talking about MS, not about its ticker 
symbol). They are both talking about the same subject, just saying 
different things. Your money-grabbing corporate producer of shoddy software 
is my provider of robust back-office support products, but the subject is 
the same.

>As long as people are free to develop their own topic maps the 
>subjectIdentity rules will be unenforceable.  In practice, subjectIdentity 
>elements will become mnemonics to guide developers in merging and 
>expanding their topic maps.  A more unpleasant possibility is that large 
>corporations will develop their own knowledge banks of copyrighted 
>'published subject indicators',a fee being payable when any map references 
>their topics (this won't stop the topic maps breaking of course, but some 
>people do pay money for badly designed software that doesn't work properly).

I think it unlikely that subject indicators will be worth copyrighting. The 
topic maps that use the subject indicators might be worth copyrighting, but 
that doesn't stop me from using the same URIs that Oxford University Press 
use in their encyclopedia in order to provide my own information about the 
same subject.

If people *want to* talk unambiguously about the same thing, then common 
subject indicator URIs will emerge. Look at musicbrainz.org. If I want to 
talk about The Clash, don't you think that 
http://www.musicbrainz.org/showartist.html?artistid=2333 would be a 
suitable subject indicator ? (assuming that this URL is stable)

>The spirit behind the subjectIdentity rules - of specifying a topic's 
>denotation - seems similar to work on constraints on topic map elements, 
>for example in tmcl and tmpm4.  Like those constraints, 'identity' is not 
>an accidental property of the topic: subjectIdentity and constraints both 
>address intensional (i.e. necessary, defining) aspects, while names and 
>occurrences address extensional (i.e. contingent) aspects.
>
>Some final suggestions:
>- mergeMap creates associations between elements and does not delete topics.

Implement it that way by all means! TM4J does...plug, plug ;-)

>- as it stands, the subjectIdentity element is actually a privileged kind 
>of occurrence.  Perhaps it should be represented as such.
>- work on subject identity - how a topic relates to the thing it reifies - 
>is rightly placed within work on topic constraints.

I think you have made some good points here Ivan. I hope that my ramblings 
have gone some way to outlining my approach to all this. However, I must 
say that there is a voice at the back of my head agreeing with you on 
refining the use of subjectIndicators (not subjectIdentity). How could an 
author indicate **in what way** a given URI is an indicator of a subject ? 
Unfortunately I think that  this ends up being a bootstrap problem - I 
would need subjects which express the ways in which subjects can be 
indicated - and how would I represent those.

In practice, the use of community and private ontologies and 
categorisations schemes (such as developed by musicbrainz.org) will be the 
way to go.

My 2p-worth done

Cheers,

Kal

-----------------------------------------------------------
Kal Ahmed
Information Management Consultant
www.techquila.com
kal@techquila.com
+44 7968 529531
------------------------------------------------------------
References:
- [topicmaps-comment] Can subjectIdentity elements guarantee topicidentity?
  - From: Ivan Uemlianin <ivan@jurakm.com>