[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]
Subject: [xtm-wg] parallel development of syntax and concept models
One of the people who attended the AG meeting in Montreal wrote to me as follows: > Hi Steve, > > I disagree with your views of syntax. I don't understand > how the conceptual model can be dependent on the > interchange syntax. To me that means that there are some > syntactic constructs that can not be modeled in the > conceptual model. Or, that we can not describe what we > mean. I was greatly moved by this obviously sincere and very well-intentioned letter, and so I'm copying my response to the group as a whole. -------- I think your sensibilities and instincts are exactly right, but perhaps you misunderstood what happened at the meeting. Nobody said that the conceptual model was going to be dependent on the interchange syntax. We simply agreed, mainly for practical reasons, that the work on these two aspects of the topic maps information representation/interchange could proceed in parallel. But I have a lot to say about the issue that you raise, because I think we have made a *radically correct* decision here, far more radically correct than the mere brevity of our window of opportunity could possibly explain. First of all, let's remember that, as a matter of historical fact, what we are calling "the conceptual model" of Topic Maps came chronologically *after* the syntax, not prior to it. The syntax was developed mainly by Michel Biezunski during his lonely and heroic years of presenting the idea of topic maps to potential users and listening carefully to their feedback -- feedback that was often extremely difficult to decipher, and that required him to develop a deep understanding of the mindsets and world views of the potential users of topic maps. As a result, the interchange syntax of topic maps is attractive to an extraordinarily wide variety of users and potential users. That attractiveness is really what has made the topic maps paradigm such an economically interesting phenomenon, and it's the real reason for the existence of the XTM Specification Authoring Group (AG). The original conceptual model of topic maps, which is very similar to the one that our modeling group (through Eliot Kimber) described at the Montreal meeting of the AG, came only in the latter years of the development of the ISO topic maps standard. During that latter period, with a working conceptual model in place, the interchange syntax was adjusted somewhat in order to make topic maps deterministically translatable into objects that would conform to the conceptual model (i.e., topic map semantic groves and/or things that resemble topic map semantic groves). The result was ISO 13250, a standard that is a winning combination of a workable (albeit implicit, in the text of the actual 13250 standard) underlying conceptual model, and a syntax that ordinary people can instantly relate to, and that they can learn in stages. The public's rapidly increasing appetite for topic maps technology is an indication that, when 13250 was built, something was done right. It is entirely justifiable, given the history of topic maps, to regard their development as a classic case of "initially top-down design". I think it's fair to summarize the consensus on parallelism of the Montreal meeting as follows: "The interchange syntax and the conceptual model of the XTM Specification need not correspond to one another structurally, but they must correspond semantically. This means that resources expressed using the interchange syntax must be unambiguously and deterministically translatable into application-internal implementations of the conceptual model. The process whereby such a translation is made should be sufficiently well defined by the XTM Specification that the same essential meaning of any resource that is expressed in conformance with the interchange syntax will be available to all conforming applications." The above statement is pretty strong, and I think it is correct and appropriate. However, it does not necessarily follow from the above statement that the conceptual model must be completely finished before work on the interchange syntax can begin. It is only necessary that both models be adjusted to each other and to all user requirements before they are both published together as the XTM Specification. Try not to think of the development of the interchange syntax as a comparatively trivial task of translating the conceptual model into some syntax-schema language (such as a DTD). Instead, think of the development of the syntax as being very like the development of a user interface, as if users were actually going to type in their topic map documents by hand (which they're not going to do, I know, but many creators of topic maps will have a need to understand and be intimate with the nuts and bolts of the topic map resources that they're creating). A user interface must be intuitive -- it must teach users about itself and about the functionality to which it provides access. During the development of a user interface, it is almost inevitable that many user requirements will be newly discovered. Should we regard these requirements as *a priori* less important than the already-known requirements that will drive the process of developing a conceptual model? I think not! The reason why we create message types is that we have certain kinds of information that we want to use certain message types to be able to convey. That is, each message type is designed to convey a certain "information set". For example, a purchase order is a type of business message that conveys a buyer's intent, willingness, and commitment to buy certain things, on certain terms, from a certain seller. Specifically: The "information set" of a message type for purchase orders includes information about the buyer, the seller, the goods and/or services to be purchased, and the terms on which the purchase will presumably be made. The preceding sentence constitutes an extremely abstract expression of the nature of the information set of purchase orders, but this expression places few if any constraints on the implementation of an API for purchase orders, or on the syntax of interchangeable purchase orders. An unbounded number of APIs to such an information set, and an unbounded number of purchase order message syntaxes could all reasonably be regarded as conforming to this abstract expression. This kind of abstract expression of an information set is so *extremely* abstract that it offers practically nothing of value to those who need standard APIs to purchase orders in order to build and maintain their transaction processing systems, or to those who need to create, send, maintain and receive purchase orders in a vendor-neutral and system-neutral fashion. Nevertheless, it is truly an "abstract model". Since we *are*, after all, attempting to promote information interchange, we must be *less* abstract than the sentence I've shown above as indented text. We must express our abstractions in ways that *will* constrain interchange syntax, such as with DTDs and schemas, and that *will* constrain the design of APIs to information sets, such as UML models, "property sets" for ISO 10744-conforming groves, and RDF schemas. For the sake of simplifying this discussion, and because they are currently being used for the XML Topic Maps Specification work, let's consider only two of these constraint modeling formalisms (I selected these because we in the XTM AG are already familiar with them): (1) UML models for the abstract expression of constraints on APIs to an information set (such as the information set of topic maps), and (2) DTDs for the abstract expression of constraints on the interchange syntax that should be used for interchanging (serializing, transmitting, and receiving) topic map information. I think the reflexes and instincts of many people in our industry -- not just you -- make them think that *any* UML model of an information set is, by definition, *necessarily* "more abstract" than a DTD designed to support the interchange of that same information set. (I've even heard it said that UML models are quite obviously more abstract than DTDs simply because they are "graphic". I'm still puzzling over that statement; I can't figure out how or why the notation used to express a model will always determine the abstractness of the model thus expressed.) In fact, however, the information set itself, when expressed in a truly abstract fashion that places no constraints on interchange and no constraints on implementation, is more abstract than both of them. Both DTDs and UML models of topic maps are specializations of a set of far more abstract notions. These more abstract notions collectively constitute the information set for which it is our task to provide both a conventional API and a conventional interchange syntax. Some readers may object, at this point, to the notion that a UML model constrains APIs when it is implemented. To me, it is obvious that UML models have the effect of constraining APIs to information sets, even though UML models ordinarily leave many decisions in the hands of implementers. At the very least, UML models identify object types and relationships between object types. It would be strange if implementations of UML models did not preserve, at least to some significant extent, these object types and their interrelationships. If not, why have a UML model at all? What purpose would it serve? Once we recognize that (a) an object-oriented API constraint model and (b) a syntax constraint model each describe an interface (if we define the word "interface" broadly enough) to some information set, it becomes possible, at last, to see that DTDs, for example, are not necessarily more or less abstract than, for example, UML models. Instead, we see that they are both dependent on a higher set of abstractions, and that the two formalisms must be used in ways that respond to utterly different requirements. The engineering and usability requirements of interchange syntaxes include non-redundancy, parsimony of data, maintainability, and implementation independence. The engineering and usability requirements of APIs include convenience of (random) access, and the explicit exposure of properties that are needed by all applications of a given information set. (Some of these properties, such as the total cost of all items being ordered in a purchase order, may not be explicit in the interchange syntax, because such information is expected to be derived by the receiving system in any case, and because its very redundancy would offer an opportunity for intra-message inconsistency to occur. As for topic maps, there are several vital properties, such as the namespaces within which topics have their names, that are not explicit in topic map documents. These properties should be exposed by topic map processing software that implements the corresponding semantic algorithms set forth in natural language in the text of the XTM Specification.) Here is an example of a normal structural difference between a syntax model and an API model for a particular information set. In a purchase order message, there may be a lengthy list of goods to be purchased. Some of the items may be accessories for some of the others, such as a printer and extra ink cartridges for that printer. In a purchase order message, the listings of the accessory items that are being ordered would normally refer to the items for which they are accessories. Such referencing capability is built into the syntax of XML, and for good reason: it makes it possible for the accessorized item (the printer, in this case) to be described only once, and to have that description be re-used, in effect, elsewhere in the message, by reference, rather than by explicit repetition. The policy of re-using the printer description, rather than repeating it wherever it would otherwise be referenced, makes it possible for anyone, at any point in the supply chain, to correct or update the description of the printer once, in the only part of the message where that description appears, without creating any inconsistencies in the message, and without having to worry about updating all the places in the message where the same information may or may not appear. This "maintainability of the message" is ordinarily a critical real-world requirement for interchange syntaxes. (There are also other critical requirements that pertain only to interchange syntaxes, but this referencing policy requirement is sufficient for making the point that interchangeable information has its own design requirements.) When the purchase order message is received and understood by the receiving system, however, it may become input to many processes and many different pieces of software. In these circumstances, it is extremely desirable to make the printer description directly available, as part of the API to the information received in the purchase order message, in the context of whatever made reference to it in the interchange message. It is not desirable for multiple pieces of software to have to know the significance of the syntax of an "accessory-for" reference in an interchangeable message, or how to dereference it. How, why, and when to resolve an "accessory-for" reference that appears in a purchase order message should be programmed and maintained in a single logic module whose sole purpose is to apply the "semantic algorithms" needed to expose the properties of purchase order messages. The object-oriented model of the information set of purchase orders should constrain this logic module to expose a single consistent API to the properties of all purchase orders in general. From the perspective of this API to all purchase orders, the accessorized item can be a property of all accessory items, and the list of to-be-purchased accessories may also be a property of the accessorized item. Structurally speaking, the structure of this convenient API does not closely resemble the structure of the best, most maintainable syntax for purchase orders, and there is no good reason why it should. In fact, as we have now shown, there are very good reasons why it shouldn't. This is why I say we are "radically correct" to develop the conceptual model and the interchange syntax of topic maps in parallel. We need the syntax-heads to consider all the requirements in their own terms. We need the object-oriented-model heads to consider all the requirements in *their* terms, too. In practice, an understanding of the actual information set (which is by definition prior to any possible expression of itself) that the two kinds of models are attempting to capture will emanate from dialogue between the two groups. The process of simultaneously developing a set of syntax constraints (such as an XML DTD) and an object-oriented model (such as an ISO 10744 property set, a certain class of UML model, or an RDF Schema) is like shining two lamps, from two perspectives, on the elusive information set that is the one true source of the ideas that will govern both models. When light comes from two directions, the illuminated object (in our case, the pure Platonic forms that constitute the topic maps paradigm and that inform the design of both kinds of models) is far less likely to hide parts of itself in shadow. Neither group's work should be considered paramount over the other's; neither group should be in charge of the other's design decisions; neither model should be considered "more abstract" and therefore prior in terms of chronology or authoritativeness. Each group must adjust its modeling work to the facts and requirements revealed by the other group's modeling work. When the work is complete, the two models, taken together, will be far more revealing of the information set than either of them could possibly be by itself; again, "two lamps are better than one". A work product consisting of the two kinds of models, with both kinds of models having been developed in parallel, will meet more needs and be generally superior to any product that would have resulted from putting the object-oriented heads in charge of the syntax heads (or vice versa). An information resource that is expressed in XML is utterly useless for anything but interchange; when the same information resource is expressed as objects in memory, it is useful for every purpose *except* interchange. Computer scientists are very fond of algorithms and methodologies for the automatic transformation of information resources from one representation to another. The fact that such tricks are available does not mean that it is always productive to use them, or that real-world systems, such as the World Wide Web, should be constrained in such a way as to always depend upon them. The fact that it is possible to generate object-oriented models from syntax models, and vice versa, does not mean that doing so is always a good idea. True, it may make a standard quicker and easier to develop and publish, but the cost in down-the-road inconvenience of the API, and/or the cost of the lack of maintainability of the syntax-conforming information resources, and/or the cost of the lack of predictability of the behavior of "conforming" applications, will probably be astronomically higher than any cost savings that can be realized by using such shortcuts. There is, as yet, no substitute for hard thinking about how best to express meaning interchangeably, on the one hand, and how to make those meanings useful, on the other. If you were not such a fine and focused engineer, and instead you were a philosopher with interests like those of [fellow XTM Founder] Andrius Kulikauskas, you might take a different (and, to my way of thinking, even more philosophically defensible) position: "No design work of any kind -- neither the designing of the conceptual model nor the designing of the interchange syntax -- can begin until all of the user requirements have been discovered and documented." Again, you'd be completely right, and, again, if this rule were enforced, we'd probably explode the group and we'd never have a standard. Even if it didn't explode the group, such a rule would be counterproductive. Experience tells me that we will never understand all the user requirements, and, more to the point, that we can never know whether we have understood all the user requirements. Many important user requirements will emerge from iterations of the processes of designing an interchange syntax *and* designing a conceptual model. (Copyright (c) 2000 Steven R. Newcomb. Permission to use this material without fee and with or without attribution to the author is granted to all for all purposes consistent with the development and promulgation of the topic maps paradigm in general, and the development and promulgation of the XTM Specification in particular. If this material is quoted or otherwise used with attribution to the author, such statements must fairly represent the author's views in a manner consistent with a reasonable understanding of the intent and meaning of this entire letter.) -Steve -- Steven R. Newcomb, Consultant srn@techno.com voice: +1 972 359 8160 fax: +1 972 359 0270 405 Flagler Court Allen, Texas 75013-2821 USA -------------------------- eGroups Sponsor -------------------------~-~> Need EDA tools on a short term or peak load basis? Take a free 7 day trial! http://click.egroups.com/1/8464/4/_/337252/_/967394672/ ---------------------------------------------------------------------_-> To Post a message, send it to: xtm-wg@eGroups.com To Unsubscribe, send a blank message to: xtm-wg-unsubscribe@eGroups.com
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]
Powered by eList eXpress LLC