The world contains many things - human beings, animals, plants, inanimate objects, thoughts, words, relationships, concepts, computer systems, information, individuals, groups, messages, actions, emotions, successful and unsucessful attempts at communication, to name but a few. Any of these things, from the most specific to the most general, from the most concrete to the most abstract, can be talked about or written about. Some, but not all, can be photographed or drawn. Some, but not all, exist within computer systems or can be addressed by those systems.
Words are the primary (but not the only) means by which we human beings communicate with one another about the things in the world. We also use gestures, pictures, sounds, works of art and more. But words have a peculiar power because, within a restricted community that shares a common language, they give us the best hope we have (though certainly not the guarantee) of communicating unambiguously.
Computer systems are among the tools we use to communicate about things in the world, to aid us in reasoning about the world, or to act directly on the world to bring about change. The Web is itself a computer system, consisting of many subsystems, operated and used by many human beings. We cannot use the Web to send an emotion, a thought, an action, a mountain, a town, a person, or a concept directly to another system or human being. But we can and do use it to send words, pictures, sounds, documents, database tables and other richly structured information objects that describe these things, and may even, in the case of emotions or thoughts, convey them, or in the case of actions, cause them to occur.
Sending words, pictures and structured information objects is very useful and powerful, and being able to do it on a worldwide scale even more so. But there is a sting in the tail. To communicate with one another, we need to convey not just words, but the meanings of those words; not just computer files, but the intent with which and for which those computer files were created. And the wider the community of people, or the number and variety of computer systems, with which we are communicating, the harder it is to ensure that the person or system receiving the words, pictures or other information, will correctly glean from them the meaning intended by the sender.
The 'Semantic Web' is a Web in which meanings can be conveyed and not just words, pictures or data. It does not exist today. Topic Maps can be used as a means towards building a semantic Web, not because they remove the gap between words or information objects and their meanings, but because they recognise that gap for what it is, and work within its limitations. A mountaineer who climbs a peak without crampons may get there by luck or supreme skill, but to enable the world at large to scale the mountain, it is better to recognise its challenges and provide simple tools to address them, rather than ask people to climb based only on their supreme desire to reach the top!
To continue the mountaineering image, we can say that the mountain we are attempting to climb has three peaks, separated by deep chasms. We want to use information objects in a computer system to communicate something we have in mind to some other person. There are three mountain peaks here, looking at each other across apparently uncrossable chasms: our mind, the real world things we are thinking of, and the information objects we use as a means of communication. These three domains are different in nature yet deeply connected. Our mind has a thought that reflects some aspect of the real world, and the information object describes those aspects of the real world that correspond to our thought. When a person receiving our communication interprets the information object, we hope it will evoke in their mind thoughts about the real world that somehow match our own. If it does, then the communication has been successful. Yet we can send neither the thought itself, nor the real world objects about which we are thinking. The thought, the information object used to express it, and the real-world things that both of them represent, are utterly distinct. They inhabit different domains and the gap between them seems unbridgeable.
Seen from another point of view, however, there is only one mountain peak, and no chasms at all. The computer system and the information objects it sends and receives, our minds and all their thoughts, are themselves part of the real world. They too can be thought of, described in information objects and communicated about. The whole system is deeply interconnected; it can point at itself, describe itself and communicate about itself. Yet, in any particular act of communication, a clear distinction must exist between the information object and the information it carries. Otherwise, we fall into the chasms of infinite regress, circular definition and, in the computer system, unending processing loops that hang the computer and leave the user staring at a frozen screen or a big red error message.
With the thoughts and images evoked by the Introduction in our minds, we can now present the XML Topic Maps conceptual model. We do this through a series of information objects (words and diagrams), which aim to be clear and simple enough to convey the XTM conceptual model to a human reader, and to be precise and formal enough to form the basis for XTM implementations that will be able to run in real-world computer systems and use the XTM interchange syntax defined in the XTM DTD. To aid understanding, some of the diagrams and descriptive text are accompanied by extracts from the XTM DTD and/or by fragments of a sample XTM document instance.
The diagrams used in this section are 'class diagrams', using the conventions of the Unified Modelling Language (UML). This means that each rectangle represents a class of objects (a kind of thing that can exist), and the lines between them represent relationships that exist or can exist between instances of those classes (individual things of those kinds).
The first diagram is extremely simple. It shows a single rectangle labeled 'Resource', and states that the defining characteristic of a Resource is that it is addressable. This means that it is possible for a computer system to determine whether the things referred to by two Resource references are or are not the same. Examples of Resources are records in database files, electronic documents, images or sounds, strings of characters, XML elements and attributes. These are all things that can exist within a computer system. They are 'addressable' in the sense that the system can retrieve them and make deterministic comparisons between them to establish their identity or difference.
Most of the things of interest in the real world are not addressable in the sense described above. To determine whether two references to things such as human feelings, physical objects, mountains, people or countries refer to the same thing requires understanding of the real world as it exists outside the confines of the computer system. Any or all of these things may be a Subject of discourse, but they are not addressable and so do not count as Resources. On the other hand, we sometimes want to talk about a particular electronic document, character string or database record. These things are indeed addressable Resources and may also become a Subject of discourse. This diagram shows that the Subject class has two subclasses, Adressable Subject and Non-addressable Subject, and that Addressable Subject is also a subclass of Resource.
A Topic is a Resource that is used to stand in for a Subject within the computer system. A Topic can be manipulated and reasoned about, and can have statements made about it. It acts as a proxy within the system for the physical or abstract, addressable or non-addredssable real-world thing that is its Subject. In this sense, it is said to 'reify' the Subject, meaning that it makes the Subject 'real' from the point of view of the system. This diagram shows that a Topic is a Resource that reifies exactly one Subject. The direction of the horizontal arrow indicates that the Topic provides an indication of what its Subject is, but that the converse is not the case. The '0..*' label indcates that any Subject may be reified by zero or more Topics, though the comment explains that the ideal case is for each Subject to be reified by no more than one Topic.
Though the Subject of a Topic may not be addressable within the system, it is always possible to provide a human-interpretable description of it. Such descriptions may themselves be embodied in information objects that are addressable Resources. The term 'Subject Descriptor' is used to describe a Resource whose human-interpretable content conveys a clear and unambiguous (definitive) description of a particular physical or abstract, addressable or non-addressable real-world thing that is the Subject reified by a Topic. This diagram shows that a Subject Descriptor is a Resource that provides a definitive description of a Subject. The '0..*' label indcates that any Subject may have zero or more Subject Descriptors.