topicmaps-comment message

Subject: Re: [xtm-wg] An XTM test suite
From: "Steven R. Newcomb" <srn@coolheads.com>
To: xtm-wg@yahoogroups.com
Date: Sat, 17 Feb 2001 22:12:25 -0600
[Lars Marius Garshol:]

> The idea is _not_ a simplified XTM syntax designed to
> be simpler to parse and implement. (It will probably
> be simpler than XTM, but only because it must.)

I don't understand how it can be simpler than the
existing XTM syntax.  It looks to me as though it must
have more element types (such as ones that make topic
namespaces redundantly explicit), and that the element
types that do correspond (in some sense) to XTM element
types will necessarily have different semantics, as
well.

For example, the Conceptual Model clearly establishes
that, under the covers, an occurrence is really a
topic-occurrence association.  What does this mean for
the "canonical output" form?  I believe that we must
output a topic-occurrence association (note that I did
*not* say <association>, I said "association").  There
are several such important distinctions.

> Certainly, it is _not_ indended as a competitor to
> the XTM syntax, only as a tool for making
> applications that support XTM 1.0 more reliable.

Understood.

> This is close to it, yes.  The idea is that in this
> syntax, any two topic maps that are logically
> equivalent will have the exact same serialized
> representation.

It's a good idea, if we can make it work.

> A canonical XTM document must
> 
>  - be UTF-8-encoded

Why this particular encoding?  What does character
encoding have to do with it, as long as the mappings
between character encodings are unambiguous and
explicit?

>  - have all elements (topic, association, baseName,
>    topicRef etc) in a specific order, probably based on
>    the lexical order of IDs and names

I don't see how this can work, unless we want to
straitjacket the order in which <topicMap> elements and
their contents are scanned and processed, and force all
applications to keep a record of that order, even
though that order has no significance.  This is a very
unappealing prospect: to require applications to keep
track of nonsignificant information, incurring
significant overhead just so their conformance to the
Spec can be verified.  If I were a developer, I'd
simply ignore a standard that required me to write
software that does things that force my customers to
spend money in ways that don't benefit them.  There
*must* be a better answer than this.

The unique identifiers (IDs) of elements found in the
content of <topicMap> elements cannot serve as the
basis for imposing a canonical order, either.  

* First of all, many (perhaps most?) of the elements
  that demand the existence of topics in the
  application-internal representation are #IMPLIED, so
  we won't have IDs for all of them.  What do we do
  with the ones that don't have IDs?

* Secondly, when we're merging multiple XTM documents,
  the IDs of the elements aren't necessarily unique.
  What do we do when two topics have the same ID?

>  - have all attributes in a specific order (and
>    possibly conform to the canonical XML specification)

OK.  (Why only "possibly"?  Making everything totally
deterministic is the whole point of this exercise.)

>  - use insignificant whitespace in a pre-determined way

OK.

>  - be consistent (as per annex F)

OK, as far as Annex F (I think misleadingly) goes.

 - have all externally referenced topic map documents
   merged in

Right.

>  - have only normalized URIs

What constitutes "normalization" of URIs?  In the topic
map paradigm, it's vitally important that two URIs that
point to the same resource be recognizable as
equivalent.  However, some applications will have more
intelligence about this than others; some will detect
sameness that others will miss, because, for example,
some will understand some kinds of fragment identifiers
better than others will.  We must not create a
conformance requirement that prevents application
builders from competing on the basis of the amount of
intelligence that is brought to bear on the question of
whether two URIs actually refer, ultimately, to one and
the same resource.  We want them to compete on this;
the ideal case, in which all URIs that ultimately refer
to one and the same resource are known to be doing so,
is probably never going to be fully achieved.  One way
to handle this is to support a user's ability to "dumb
down" the URI-comparison processing to some specified
level, just for purposes of outputting a canonical form
simply for establishing conformance to the Spec in all
other Spec-required respects.

>  - represent all topic map constructs in a single way
>    (so, for example, <instanceOf> and <scope> will only
>    ever contain <topicRef>, since <subjectIndicatorRef>
>    and <resourceRef> are implicit <topicRef>s)

This remark leads me to believe that you are thinking
in terms of using some version of the XTM
syntax as the canonical output syntax, as if XTM syntax
were somehow the same thing as this canonical output
idea.  This is a bad idea, for a variety of reasons,
and especially the reasons I've already mentioned in
previous notes.  Let me add more reasons: 

* It would be very bad if there were any confusion
  whatsoever about whether a particular XML element or
  document is expressed in XTM syntax or in our
  canonical output syntax.  The best way to avoid such
  confusion is to avoid having element type names in
  common between the two syntaxes.

* Having element type names in common will greatly
  diminish our (the XTM Authoring Group's) ability to
  communicate clearly and unambiguously among
  ourselves.  When we say "<topic>", we really must be
  disciplined in meaning only what that string
  (<topic>) means at input time, because the
  corresponding construct that appears in canonical
  output is not exactly the same kind of thing (for one
  example of why this is true, see the discussion of
  topic-occurrence associations, above). If we don't
  establish these distinctions in our discussions, we
  will misunderstand each other, and our productivity
  as a group will be diminished.

* Having element type names in common will muddle our
  thinking as individuals.  We must not allow ourselves
  to make unconscious assumptions about the nature of
  processed topic map information.  The structure of
  the canonical output must reflect precisely the
  abstract structure of the application-internal form
  of topic map information, as it will be defined by
  the Authoring Group.  The syntactic structure of the
  input documents is irrelevant, and pretending that it
  is somehow relevant will only blind and confuse us.

> | Syntactic equivalences between XTM <topicMap>
> | elements, as these are discussed in XTM today, are
> | insufficient to define what topic map information
> | actually is.

> This I don't follow. You seem to imply here that
> something more than what I propose above is
> needed. My problem is that I have a release schedule
> to meet and must act very quickly indeed. So if
> something radically more complex is needed I would
> prefer to do this first, and then that as a second
> stage.

OK.  In order to walk in a particular direction, we
must move by steps.  I would only ask that each of us
tries to be objective about technical decisions.  That
means trying not to make technical decisions on the
basis of our own individual business objectives, but
rather on the basis of how best to develop the industry
as a whole.  The only thing that competitors can be
expected to agree about is how to make the industry
grow (and even that much is a minor miracle).  I hope
there won't be too many conflicts among us, and that
the resolution of the conflicts can be navigated in a
way that doesn't bruise anyone economically.  Taking
well-considered steps *together* is a good way to do
that.

BTW, I'm voting "Yes" on XTM 1.0, although I have grave
misgivings about Annex F, which I find misleading --
not so much by what it says, but by what it doesn't
say.

-Steve

--
Steven R. Newcomb, Consultant
srn@coolheads.com

voice: +1 972 359 8160
fax:   +1 972 359 0270

405 Flagler Court
Allen, Texas 75013-2821 USA

------------------------ Yahoo! Groups Sponsor ---------------------~-~>
eGroups is now Yahoo! Groups
Click here for more details
http://click.egroups.com/1/11231/0/_/337252/_/982448124/
---------------------------------------------------------------------_->

To Post a message, send it to:   xtm-wg@eGroups.com

To Unsubscribe, send a blank message to: xtm-wg-unsubscribe@eGroups.com
Follow-Ups:
- Re: [xtm-wg] An XTM test suite
  - From: Steve Pepper <pepper@ontopia.net>
- Re: [xtm-wg] An XTM test suite
  - From: Lars Marius Garshol <larsga@garshol.priv.no>
References:
- [xtm-wg] An XTM test suite
  - From: Lars Marius Garshol <larsga@garshol.priv.no>
- Re: [xtm-wg] An XTM test suite
  - From: "Steven R. Newcomb" <srn@coolheads.com>
- Re: [xtm-wg] An XTM test suite
  - From: Lars Marius Garshol <larsga@garshol.priv.no>