XML Topic Maps

Introduction

World, Words and Web

The world contains many things - human beings, animals, plants, inanimate objects, thoughts, words, relationships, concepts, computer systems, information, individuals, groups, messages, actions, emotions, successful and unsucessful attempts at communication, to name but a few. Any of these things, from the most specific to the most general, from the most concrete to the most abstract, can be talked about or written about. Some, but not all, can be photographed or drawn. Some, but not all, exist within computer systems or can be addressed by those systems.

Words are the primary (but not the only) means by which we human beings communicate with one another about the things in the world. We also use gestures, pictures, sounds, works of art and more. But words have a peculiar power because, within a restricted community that shares a common language, they give us the best hope we have (though certainly not the guarantee) of communicating unambiguously.

Computer systems are among the tools we use to communicate about things in the world, to aid us in reasoning about the world, or to act directly on the world to bring about change. The Web is itself a computer system, consisting of many subsystems, operated and used by many human beings. We cannot use the Web to send an emotion, a thought, an action, a mountain, a town, a person, or a concept directly to another system or human being. But we can and do use it to send words, pictures, sounds, documents, database tables and other richly structured information objects that describe these things, and may even, in the case of emotions or thoughts, convey them, or in the case of actions, cause them to occur.

Topic Maps and the 'Semantic Web'

Sending words, pictures and structured information objects is very useful and powerful, and being able to do it on a worldwide scale even more so. But there is a sting in the tail. To communicate with one another, we need to convey not just words, but the meanings of those words; not just computer files, but the intent with which and for which those computer files were created. And the wider the community of people, or the number and variety of computer systems, with which we are communicating, the harder it is to ensure that the person or system receiving the words, pictures or other information, will correctly glean from them the meaning intended by the sender.

The 'Semantic Web' is a Web in which meanings can be conveyed and not just words, pictures or data. It does not exist today. Topic Maps can be used as a means towards building a semantic Web, not because they remove the gap between words or information objects and their meanings, but because they recognise that gap for what it is, and work within its limitations. A mountaineer who climbs a peak without crampons may get there by luck or supreme skill, but to enable the world at large to scale the mountain, it is better to recognise its challenges and provide simple tools to address them, rather than ask people to climb based only on their supreme desire to reach the top!

Peaks and Chasms

To continue the mountaineering image, we can say that the mountain we are attempting to climb has three peaks, separated by deep chasms. We want to use information objects in a computer system to communicate something we have in mind to some other person. There are three mountain peaks here, looking at each other across apparently uncrossable chasms: our mind, the real world things we are thinking of, and the information objects we use as a means of communication. These three domains are different in nature yet deeply connected. Our mind has a thought that reflects some aspect of the real world, and the information object describes those aspects of the real world that correspond to our thought. When a person receiving our communication interprets the information object, we hope it will evoke in their mind thoughts about the real world that somehow match our own. If it does, then the communication has been successful. Yet we can send neither the thought itself, nor the real world objects about which we are thinking. The thought, the information object used to express it, and the real-world things that both of them represent, are utterly distinct. They inhabit different domains and the gap between them seems unbridgeable.

Seen from another point of view, however, there is only one mountain peak, and no chasms at all. The computer system and the information objects it sends and receives, our minds and all their thoughts, are themselves part of the real world. They too can be thought of, described in information objects and communicated about. The whole system is deeply interconnected; it can point at itself, describe itself and communicate about itself. Yet, in any particular act of communication, a clear distinction must exist between the information object and the information it carries. Otherwise, we fall into the chasms of infinite regress, circular definition and, in the computer system, unending processing loops that hang the computer and leave the user staring at a frozen screen or a big red error message.

The XTM Conceptual Model

With the thoughts and images evoked by the Introduction in our minds, we can now present the XML Topic Maps conceptual model. We do this through a series of information objects (words and diagrams), which aim to be clear and simple enough to convey the XTM conceptual model to a human reader, yet precise and formal enough to form the basis for XTM implementations that will be able to run in real-world computer systems and use the XTM interchange syntax defined in the XTM DTD. To aid understanding, some of the diagrams and descriptive text are accompanied by extracts from the XTM DTD and/or by fragments of a sample XTM document instance.

The diagrams used in this section are 'class diagrams', using the conventions of the Unified Modelling Language (UML). Each plain rectangle represents a class of objects (a kind of thing that can exist), and the lines and arrows between them represent relationships that exist or can exist between instances of those classes (individual things of those kinds).

A Resource is an addressable object

The first diagram is extremely simple. It shows a single rectangle labeled 'Resource', and states that the defining characteristic of a Resource is that it is addressable. This means that it is possible for a computer system to determine whether the things referred to by two Resource references are or are not the same. Examples of Resources are records in database files, electronic documents, images and sounds, strings of characters, and XML elements and attributes. These are all things that can exist within a computer system. They are 'addressable' in the sense that the system can retrieve them and make deterministic comparisons between them to establish their identity or difference.

Subjects of discourse may or may not be addressable

Most of the things of interest in the real world are not addressable in the sense described above. To determine whether two references to things such as human feelings, physical objects, mountains, people or countries refer to the same thing requires understanding of the real world as it exists outside the confines of the computer system. Any or all of these things may be a Subject of discourse, but they are not addressable and so do not count as Resources. On the other hand, we sometimes want to talk about a particular electronic document, character string or database record. These things are indeed addressable Resources and may also become a Subject of discourse. This diagram shows that the Subject class has two subclasses, Adressable Subject and Non-addressable Subject, and that Addressable Subject is also a subclass of Resource.

Topics reify Subjects within a computer system

A Topic is a Resource that is used to stand in for a Subject within the computer system. A Topic can be manipulated and reasoned about, and can have statements made about it. It acts as a proxy within the system for the physical or abstract, addressable or non-addredssable real-world thing that is its Subject. In this sense, it is said to 'reify' the Subject, meaning that it makes the Subject 'real' from the point of view of the system. This diagram shows that a Topic is a Resource that reifies exactly one Subject. The direction of the horizontal arrow indicates that the Topic provides an indication of what its Subject is, but that the converse is not the case. The '0..*' label indcates that any Subject may be reified by zero or more Topics. The comment explains that the ideal case is for each Subject to be reified by no more than one Topic.

Resources can describe Subjects

Though the Subject of a Topic may not be addressable within the system, it is always possible to provide a human-interpretable description of it. The term 'Subject Descriptor' is used to denote a Resource whose human-interpretable content is capable of conveying a clear and unambiguous indication of which particular physical or abstract, addressable or non-addressable real-world thing is the Subject of the Topic. This diagram shows that a Subject Descriptor is a Resource that provides a definitive description of a Subject. The '0..*' label indcates that any Subject may have zero or more Subject Descriptors.

Only Addressable Subjects can be referenced directly

In most cases, the Subject of a Topic can only be referenced through the use of human-interpretable Subject Descriptors. However, in the special case where the Subject of the Topic is a Resource, it can be referenced directly. This diagram brings together and amplifies the main points of the previous two diagrams. It shows, as we saw before, that a Topic reifies a Subject and that a Subject Descriptor is a Resource that indicates what the Subject is. It also shows that a Topic may reference any number of Subject Descriptors, and that if the Subject is a Resource (an Addressable Subject), the Topic can reference it directly.

A Topic Map consists of one or more sets of Topics

A Topic Map consists of any number of Topic Sets, each of which is defined by a Resource. A Topic Set may comprise any number of Topics. The Topic Map is deemed to contain all the Topics that constitute its component Topic Sets. When Topic Sets are brought together in a single Topic Map, Topics that have the same Subject should be merged. In the case of Topics with Addressable Subjects, this merging is a deterministic process. In the case of Topics with Non-addressable Subjects, merging may occur either because two Topics have at least one Subject Descriptor in common, or because they share a Base Name within a common Scope. The notions of Scope and Base Name are descibed below.

Relationships are applicable within defined Scopes

Relationships between things are rarely apply in all circumstances or for all time. Here we introduce the notion of Scope, which is best described as the context within which a particular relationship pertains. A Scope comprises a set of Topics which place limits on the validity of the relationship. For example, the relationship of "ally" between two countries may be limited to a particular time period or a particular conflict, or both. XTM allows a relationship to be asserted, but constrained by being associated with a Scope consisting of one or more Topics. The meaning of this is that the relationship only applies within the context of all the Topics belonging to the Scope. This diagram shows that a Scope consists of zero or more Topics, and that Topics may be added to the Scope to limit the context that the Scope defines.

A Topic may have only one Base Name within a given Scope

One important relationship is that between a Topic and its name or names. A Topic may have many names, applicable in different contexts. The notion of Base Name is of a relationship between a String, known as the Base Name String, a Topic, and a Scope that defines the context within which that String is considered to be a name for that Topic. There is a constraining rule that only one Topic may have a given Base Name within a given Scope. This means that the combination of Base Name and Scope can be used to identify a Topic uniquely.

Associations relate Topics together within a Scope

Topics may be related to one another thorugh Associations. An Association is said to have Members, each of which may comprise any number of Topics whose Subjects are involved in the Association in the same way - that is, they play the same *role* in the Association. The Association is itself a Topic whose Subject is the relationship between the Subjects of the Topics that make up its Members. Members too are Toics in their own right. The Scope, if present, serves to limit the context within which the Association is valid.

Association Templates define classes of Association

An Association may be derived from another Association, which acts as its template. There is a class-instance relationship between the template Association and the derived Assocation. The template Association specifies constraints on all derived Associations that are instances of it. The Subjects of the Members of the template Assocation are the roles played by the Subjects of the Members of the derived Association. A role is a 'class of involvement'. Its instances are the specific involvements of the players in the derived Association. Finally, the Subjects of the Topics that make up the Members of the template Association are classes to which the Subjects of the Topics that make up the corresponding Members of the derived Assocatiation may belong. The associated Topic in the template Topic thus reifies a 'class of player of role'. It is a requirement that each Topic that participates in a Member of a derived Association must reify an instance of a class that is reified by a Topic that particpates in the corresponding Member of the template Association.

An example will help to make this clearer. In this example, we consider John and Mary, who are married to one another. The relationship between John and Mary is thus an instance of the class 'marriage'. In this relationship, John plays the role of husband, and Mary plays the role of wife. We can create an Association between the Topic whose Subject is John and the Topic whose Subject is Mary. This Association is derived from the marriage Association template, which has a 'husband' member comprising the Topic 'man', and a 'wife' Member comprising the Topic 'woman'. This template states that in every Assocation derived from it, the role of husband must be played by a man, and the role of wife must be played by a woman. The Scope of the Association template might be the Topic whose subject is the particular legal system under which the marriage is constituted.

In some other legal system, it might be the case that people are considered to be a men or women from the age of 18, but a marriage may be entered into from the age of 16. We could create another marriage Association whose Scopie is a Topic that reifies this other legal system. In this Association, the wife Member would contain tow Topics - 'woman' and 'girl' - and the husband Member would also contain two Topics - 'man' and 'boy'. In Associations derived from this template, the constraint is that the role of husband must be played either by a man or by a boy, and the role of wife must be played either by a woman or by a girl.

Topic Occurrences are Associations between Topics and Resources

Association templates can be used in very powerful ways, to build structures of meaningful relationships between Topics and Resources. Several important Association templates will be defined in the remainder of this specification. However, we shall begin with one that is fundamental to the notion of Topic Maps - so fundamental, indeed, that it is expressed through special syntax within the XTM DTD. This is the TopicOccurrence template. It is structured as follows: The TopicOccurrence Association has two members, a TopicMember and an OccurrenceMember. The TopicMember may comprise any Topic at all, but the OccurrenceMember must have a Topic whose Subjbect is a Resource. The meaning of this is that a TopicOccurrence associates a Topic with a Resource. The Resource is wone that is relevant to the Topic in some way, and is known as an 'occurrence' of the Topic. An Association that uses the TopicOccurrence Template as its template may itself have a class-instance association with another Topic whose subject is an 'occurrence type'. Examples of occurrence type might be definition, mention, or description, meaning that the Resource in question defines, mentions or describes the Topic of which it is an occurrence..

Class-Instance Associations between Topics may be asserted

....

Class-Subclass Associations between Topics may be asserted

....

XTM Interchange Syntax

...

The XTM DTD

...

Sample XTM Instance

Hamlet, Prince of Denmark

...