topicmaps-comment message

Subject: [xtm-wg] parallel development of syntax and concept models
From: "Steven R. Newcomb" <srn@techno.com>
To: xtm-wg@egroups.com
Date: Sun, 27 Aug 2000 11:46:30 -0500
One of the people who attended the AG meeting in Montreal
wrote to me as follows:

> Hi Steve,
> 
> I disagree with your views of syntax. I don't understand
> how the conceptual model can be dependent on the
> interchange syntax. To me that means that there are some
> syntactic constructs that can not be modeled in the
> conceptual model.  Or, that we can not describe what we
> mean.

I was greatly moved by this obviously sincere and very
well-intentioned letter, and so I'm copying my response to
the group as a whole.

--------

I think your sensibilities and instincts are exactly right,
but perhaps you misunderstood what happened at the meeting.
Nobody said that the conceptual model was going to be
dependent on the interchange syntax.  We simply agreed,
mainly for practical reasons, that the work on these two
aspects of the topic maps information
representation/interchange could proceed in parallel.  But I
have a lot to say about the issue that you raise, because I
think we have made a *radically correct* decision here, far
more radically correct than the mere brevity of our window
of opportunity could possibly explain.

First of all, let's remember that, as a matter of historical
fact, what we are calling "the conceptual model" of Topic
Maps came chronologically *after* the syntax, not prior to
it.  The syntax was developed mainly by Michel Biezunski
during his lonely and heroic years of presenting the idea of
topic maps to potential users and listening carefully to
their feedback -- feedback that was often extremely
difficult to decipher, and that required him to develop a
deep understanding of the mindsets and world views of the
potential users of topic maps.  As a result, the interchange
syntax of topic maps is attractive to an extraordinarily
wide variety of users and potential users.  That
attractiveness is really what has made the topic maps
paradigm such an economically interesting phenomenon, and
it's the real reason for the existence of the XTM
Specification Authoring Group (AG).  

The original conceptual model of topic maps, which is very
similar to the one that our modeling group (through Eliot
Kimber) described at the Montreal meeting of the AG, came
only in the latter years of the development of the ISO topic
maps standard.  During that latter period, with a working
conceptual model in place, the interchange syntax was
adjusted somewhat in order to make topic maps
deterministically translatable into objects that would
conform to the conceptual model (i.e., topic map semantic
groves and/or things that resemble topic map semantic
groves).  The result was ISO 13250, a standard that is a
winning combination of a workable (albeit implicit, in the
text of the actual 13250 standard) underlying conceptual
model, and a syntax that ordinary people can instantly
relate to, and that they can learn in stages.  The public's
rapidly increasing appetite for topic maps technology is an
indication that, when 13250 was built, something was done
right.  It is entirely justifiable, given the history of
topic maps, to regard their development as a classic case of
"initially top-down design".

I think it's fair to summarize the consensus on parallelism
of the Montreal meeting as follows:

  "The interchange syntax and the conceptual model of the
  XTM Specification need not correspond to one another
  structurally, but they must correspond semantically.  This
  means that resources expressed using the interchange
  syntax must be unambiguously and deterministically
  translatable into application-internal implementations of
  the conceptual model.  The process whereby such a
  translation is made should be sufficiently well defined by
  the XTM Specification that the same essential meaning of
  any resource that is expressed in conformance with the
  interchange syntax will be available to all conforming
  applications."

The above statement is pretty strong, and I think it is
correct and appropriate.  However, it does not necessarily
follow from the above statement that the conceptual model
must be completely finished before work on the interchange
syntax can begin.  It is only necessary that both models be
adjusted to each other and to all user requirements before
they are both published together as the XTM Specification.

Try not to think of the development of the interchange
syntax as a comparatively trivial task of translating the
conceptual model into some syntax-schema language (such as a
DTD).  Instead, think of the development of the syntax as
being very like the development of a user interface, as if
users were actually going to type in their topic map
documents by hand (which they're not going to do, I know,
but many creators of topic maps will have a need to
understand and be intimate with the nuts and bolts of the
topic map resources that they're creating).  A user
interface must be intuitive -- it must teach users about
itself and about the functionality to which it provides
access.  During the development of a user interface, it is
almost inevitable that many user requirements will be newly
discovered.  Should we regard these requirements as *a
priori* less important than the already-known requirements
that will drive the process of developing a conceptual
model?  I think not!

The reason why we create message types is that we have
certain kinds of information that we want to use certain
message types to be able to convey.  That is, each message
type is designed to convey a certain "information set".  For
example, a purchase order is a type of business message that
conveys a buyer's intent, willingness, and commitment to buy
certain things, on certain terms, from a certain seller.
Specifically:

  The "information set" of a message type for purchase
  orders includes information about the buyer, the seller,
  the goods and/or services to be purchased, and the terms
  on which the purchase will presumably be made.

The preceding sentence constitutes an extremely abstract
expression of the nature of the information set of purchase
orders, but this expression places few if any constraints on
the implementation of an API for purchase orders, or on the
syntax of interchangeable purchase orders.  An unbounded
number of APIs to such an information set, and an unbounded
number of purchase order message syntaxes could all
reasonably be regarded as conforming to this abstract
expression.  This kind of abstract expression of an
information set is so *extremely* abstract that it offers
practically nothing of value to those who need standard APIs
to purchase orders in order to build and maintain their
transaction processing systems, or to those who need to
create, send, maintain and receive purchase orders in a
vendor-neutral and system-neutral fashion.  Nevertheless, it
is truly an "abstract model".  Since we *are*, after all,
attempting to promote information interchange, we must be
*less* abstract than the sentence I've shown above as
indented text.  We must express our abstractions in ways
that *will* constrain interchange syntax, such as with DTDs
and schemas, and that *will* constrain the design of APIs to
information sets, such as UML models, "property sets" for
ISO 10744-conforming groves, and RDF schemas.

For the sake of simplifying this discussion, and because
they are currently being used for the XML Topic Maps
Specification work, let's consider only two of these
constraint modeling formalisms (I selected these because we
in the XTM AG are already familiar with them):

(1) UML models for the abstract expression of constraints on
    APIs to an information set (such as the information set
    of topic maps), and

(2) DTDs for the abstract expression of constraints on the
    interchange syntax that should be used for interchanging
    (serializing, transmitting, and receiving) topic map
    information.

I think the reflexes and instincts of many people in our
industry -- not just you -- make them think that *any* UML
model of an information set is, by definition, *necessarily*
"more abstract" than a DTD designed to support the
interchange of that same information set.  (I've even heard
it said that UML models are quite obviously more abstract
than DTDs simply because they are "graphic".  I'm still
puzzling over that statement; I can't figure out how or why
the notation used to express a model will always determine
the abstractness of the model thus expressed.)  In fact,
however, the information set itself, when expressed in a
truly abstract fashion that places no constraints on
interchange and no constraints on implementation, is more
abstract than both of them.  Both DTDs and UML models of
topic maps are specializations of a set of far more abstract
notions.  These more abstract notions collectively
constitute the information set for which it is our task to
provide both a conventional API and a conventional
interchange syntax.

Some readers may object, at this point, to the notion that a
UML model constrains APIs when it is implemented.  To me, it
is obvious that UML models have the effect of constraining
APIs to information sets, even though UML models ordinarily
leave many decisions in the hands of implementers.  At the
very least, UML models identify object types and
relationships between object types.  It would be strange if
implementations of UML models did not preserve, at least to
some significant extent, these object types and their
interrelationships.  If not, why have a UML model at all?
What purpose would it serve?

Once we recognize that 

(a) an object-oriented API constraint model and

(b) a syntax constraint model 

each describe an interface (if we define the word
"interface" broadly enough) to some information set, it
becomes possible, at last, to see that DTDs, for example,
are not necessarily more or less abstract than, for example,
UML models.  Instead, we see that they are both dependent on
a higher set of abstractions, and that the two formalisms
must be used in ways that respond to utterly different
requirements.

The engineering and usability requirements of interchange
syntaxes include non-redundancy, parsimony of data,
maintainability, and implementation independence.

The engineering and usability requirements of APIs include
convenience of (random) access, and the explicit exposure of
properties that are needed by all applications of a given
information set.

(Some of these properties, such as the total cost of all
items being ordered in a purchase order, may not be explicit
in the interchange syntax, because such information is
expected to be derived by the receiving system in any case,
and because its very redundancy would offer an opportunity
for intra-message inconsistency to occur.  As for topic
maps, there are several vital properties, such as the
namespaces within which topics have their names, that are
not explicit in topic map documents.  These properties
should be exposed by topic map processing software that
implements the corresponding semantic algorithms set forth
in natural language in the text of the XTM Specification.)

Here is an example of a normal structural difference between
a syntax model and an API model for a particular information
set.  In a purchase order message, there may be a lengthy
list of goods to be purchased.  Some of the items may be
accessories for some of the others, such as a printer and
extra ink cartridges for that printer.  In a purchase order
message, the listings of the accessory items that are being
ordered would normally refer to the items for which they are
accessories.  Such referencing capability is built into the
syntax of XML, and for good reason: it makes it possible for
the accessorized item (the printer, in this case) to be
described only once, and to have that description be
re-used, in effect, elsewhere in the message, by reference,
rather than by explicit repetition.  The policy of re-using
the printer description, rather than repeating it wherever
it would otherwise be referenced, makes it possible for
anyone, at any point in the supply chain, to correct or
update the description of the printer once, in the only part
of the message where that description appears, without
creating any inconsistencies in the message, and without
having to worry about updating all the places in the message
where the same information may or may not appear.  This
"maintainability of the message" is ordinarily a critical
real-world requirement for interchange syntaxes.  (There are
also other critical requirements that pertain only to
interchange syntaxes, but this referencing policy
requirement is sufficient for making the point that
interchangeable information has its own design
requirements.)

When the purchase order message is received and understood
by the receiving system, however, it may become input to
many processes and many different pieces of software.  In
these circumstances, it is extremely desirable to make the
printer description directly available, as part of the API
to the information received in the purchase order message,
in the context of whatever made reference to it in the
interchange message.  It is not desirable for multiple
pieces of software to have to know the significance of the
syntax of an "accessory-for" reference in an interchangeable
message, or how to dereference it.  How, why, and when to
resolve an "accessory-for" reference that appears in a
purchase order message should be programmed and maintained
in a single logic module whose sole purpose is to apply the
"semantic algorithms" needed to expose the properties of
purchase order messages.  The object-oriented model of the
information set of purchase orders should constrain this
logic module to expose a single consistent API to the
properties of all purchase orders in general.  From the
perspective of this API to all purchase orders, the
accessorized item can be a property of all accessory items,
and the list of to-be-purchased accessories may also be a
property of the accessorized item.  Structurally speaking,
the structure of this convenient API does not closely
resemble the structure of the best, most maintainable syntax
for purchase orders, and there is no good reason why it
should.  In fact, as we have now shown, there are very good
reasons why it shouldn't.  This is why I say we are
"radically correct" to develop the conceptual model and the
interchange syntax of topic maps in parallel.

We need the syntax-heads to consider all the requirements in
their own terms.  We need the object-oriented-model heads to
consider all the requirements in *their* terms, too.  In
practice, an understanding of the actual information set
(which is by definition prior to any possible expression of
itself) that the two kinds of models are attempting to
capture will emanate from dialogue between the two groups.
The process of simultaneously developing a set of syntax
constraints (such as an XML DTD) and an object-oriented
model (such as an ISO 10744 property set, a certain class of
UML model, or an RDF Schema) is like shining two lamps, from
two perspectives, on the elusive information set that is the
one true source of the ideas that will govern both models.
When light comes from two directions, the illuminated object
(in our case, the pure Platonic forms that constitute the
topic maps paradigm and that inform the design of both kinds
of models) is far less likely to hide parts of itself in
shadow.  Neither group's work should be considered paramount
over the other's; neither group should be in charge of the
other's design decisions; neither model should be considered
"more abstract" and therefore prior in terms of chronology
or authoritativeness.  Each group must adjust its modeling
work to the facts and requirements revealed by the other
group's modeling work.  When the work is complete, the two
models, taken together, will be far more revealing of the
information set than either of them could possibly be by
itself; again, "two lamps are better than one".  A work
product consisting of the two kinds of models, with both
kinds of models having been developed in parallel, will meet
more needs and be generally superior to any product that
would have resulted from putting the object-oriented heads
in charge of the syntax heads (or vice versa).

An information resource that is expressed in XML is utterly
useless for anything but interchange; when the same
information resource is expressed as objects in memory, it
is useful for every purpose *except* interchange.  Computer
scientists are very fond of algorithms and methodologies for
the automatic transformation of information resources from
one representation to another.  The fact that such tricks
are available does not mean that it is always productive to
use them, or that real-world systems, such as the World Wide
Web, should be constrained in such a way as to always depend
upon them.  The fact that it is possible to generate
object-oriented models from syntax models, and vice versa,
does not mean that doing so is always a good idea.  True, it
may make a standard quicker and easier to develop and
publish, but the cost in down-the-road inconvenience of the
API, and/or the cost of the lack of maintainability of the
syntax-conforming information resources, and/or the cost of
the lack of predictability of the behavior of "conforming"
applications, will probably be astronomically higher than
any cost savings that can be realized by using such
shortcuts.  There is, as yet, no substitute for hard
thinking about how best to express meaning interchangeably,
on the one hand, and how to make those meanings useful, on
the other.

If you were not such a fine and focused engineer, and
instead you were a philosopher with interests like those of
[fellow XTM Founder] Andrius Kulikauskas, you might take a
different (and, to my way of thinking, even more
philosophically defensible) position:

  "No design work of any kind -- neither the designing of
  the conceptual model nor the designing of the interchange
  syntax -- can begin until all of the user requirements
  have been discovered and documented."

Again, you'd be completely right, and, again, if this rule
were enforced, we'd probably explode the group and we'd
never have a standard.  Even if it didn't explode the group,
such a rule would be counterproductive.  Experience tells me
that we will never understand all the user requirements,
and, more to the point, that we can never know whether we
have understood all the user requirements.  Many important
user requirements will emerge from iterations of the
processes of designing an interchange syntax *and* designing
a conceptual model.

  (Copyright (c) 2000 Steven R. Newcomb.  Permission to use
   this material without fee and with or without attribution
   to the author is granted to all for all purposes
   consistent with the development and promulgation of the
   topic maps paradigm in general, and the development and
   promulgation of the XTM Specification in particular.  If
   this material is quoted or otherwise used with
   attribution to the author, such statements must fairly
   represent the author's views in a manner consistent with
   a reasonable understanding of the intent and meaning of
   this entire letter.)

-Steve

--
Steven R. Newcomb, Consultant
srn@techno.com

voice: +1 972 359 8160
fax:   +1 972 359 0270

405 Flagler Court
Allen, Texas 75013-2821 USA

-------------------------- eGroups Sponsor -------------------------~-~>
Need EDA tools on a short term or peak load basis?
Take a free 7 day trial!
http://click.egroups.com/1/8464/4/_/337252/_/967394672/
---------------------------------------------------------------------_->

To Post a message, send it to:   xtm-wg@eGroups.com

To Unsubscribe, send a blank message to: xtm-wg-unsubscribe@eGroups.com
Follow-Ups:
- RE: [xtm-wg] parallel development of syntax and concept models
  - From: "Michel Biezunski" <mb@infoloom.com>
- [xtm-wg] Re: parallel development of syntax and concept models
  - From: "Luis J. Martinez" <luisjm@luisjm.com>