topicmaps-comment message

Subject: [xtm-wg] Comments on the syntax proposal

From: Lars Marius Garshol <larsga@ontopia.net>
To: xtm-wg@egroups.com
Date: Mon, 20 Nov 2000 16:11:25 +0100


Geir Ove Grønmo, Hans Christian Alsos and myself have gone through the
files from the Dallas meeting and produced the attached comments to them.

In general, our opinion is that the syntax looks clear and good, if somewhat
voluminous, but that a large number of issues need to be clearly specified.
These are listed in our comments.

Personally, I would like to provide more feedback and give a fuller impression
on my reaction to the spec, but things are impossibly hectic here at the 
moment, and so there is no time.

--Lars M. (off to a national conference to market XTM :)



-------------------------- eGroups Sponsor -------------------------~-~>
Create your business web site your way now at Bigstep.com.
It's the fast, easy way to get online, to promote your business,
and to sell your products and services. Try Bigstep.com now.
http://click.egroups.com/1/9183/1/_/337252/_/974732767/
---------------------------------------------------------------------_->

To Post a message, send it to:   xtm-wg@eGroups.com

To Unsubscribe, send a blank message to: xtm-wg-unsubscribe@eGroups.com

Title: Ontopia's comments on XTM 0.9

Ontopia's comments on XTM 0.9

By:	Lars Marius Garshol
Affiliation:	Ontopia
Date:	$Date: 2000/11/20 15:08:51 $
Version:	$Revision: 1.2 $

General

Relationship with ISO 13250

Syntax

mergeMap

topicRef

URI references

subjectRef

referent=isSubject

baseName

variant

variantName

resourceData

occurrence

member

scope/addToScopes

Data model

TopicMap

Abstract

These comments are based on the files from the Dallas meeting, specifically the XTM DTD version 0.9 revision 1.4, the conceptual model document and the reification document.

These comments are the results of internal discussions at Ontopia, and so are the combined opinions of Hans Christian Alsos, Geir Ove Grønmo and Lars Marius Garshol.

General

It seems natural that the specification be written in such a way that it assumes the presence of three distinct software entities: the application (which uses the topic map grove to perform some function), the topic map engine (which holds the parsed topic map grove) and the topic map processor (which reads the serialized topic map document). This is analogous to the application/processor distinction in the XML 1.0 recommendation.

Since the XTM 1.0 syntax specification defines an interchange syntax it seems natural to us that the following principles should guide its design:

The syntax should be easy to comprehend, as for many people it will be the vehicle through which they understand topic maps. In many cases it will also be how they are first exposed to topic maps.
The mapping from the syntax to the underlying data model should be as straightforward as possible, in order to avoid the confusion that is otherwise likely to result for those understanding topic maps through their syntax.
Convenience in the syntactical representation should be considered to be of secondary importance compared to the above goal. To put it in another way: when we below as whether the results of parsing should somehow be normalized in order to yield the final topic map graph, the answer should be no (except in the cases where we are dealing with strings that are to be matched for equivalence).

Relationship with ISO 13250

XTM 1.0 changes ISO 13250 in non-trivial ways. This means several things:

XTM has changed an ISO standard for reasons that may not be immediately obvious to external observers; this may be bad marketing
several companies have expended substantial efforts on implementing the ISO 13250 model and these must now spend thousands of dollars on reworking their implementations. Some form of guarantee that these are the last changes and justification for the changes would be very much welcome.
if it is to be possible to build standardized template/schema systems, query languages and textual notations for topic maps the model must remain stable.

It is our opinion that XTM should seriously consider these issues, even to the point of making guarantees about the future stability of the standard and undoing some of the model changes from ISO 13250 (especially naming).

Syntax

The specification should make it clear that conforming XTM 1.0 implementations must use namespaces and that recognition of names defined by XLink, XBase and XTM must be based on a namespace view of the document rather than an XML 1.0 view. In other words, the namespace prefixes used in the XTM 1.0 DTD are not the only possible namespace prefixes for these namespaces.

Error handling must be specified by XTM 1.0. What implementations should do when faced with well-formedness errors, validity errors and violations of the XTM 1.0 syntax specification must be clearly defined.

mergeMap

Should implementations be required to keep track of which topic map documents went into the topic map graph resulting from parsing the starting topic map document and all merged topic map documents? Should they be required to keep enough information to be able to reverse the merging and separate out the original topic map documents?

The specification should make it clear that recursion loops in the mergeMap references are not allowed.

Should addToScopes be called scope instead? To us this would seem natural, since as in many other cases this element holds a collection of topics defining a scope for its contents that are inherited by the contents. That the contents in this case are external seems immaterial to us.

mergeMap can hold multiple addToScopes elements, each of which can hold multiple topicRefs. To us this is confusing, since all topic map contstructs only have a single scope. So how can multiple addToScopes elements be added to constructs that can only hold a single scope? To us it seems far better to only allow a single addToScopes child element.

What are the exact rules for computing the scopes of the topic map nodes added to the topic map graph from merged-in topic map documents?

Should fragment identifiers be supported for mergeMap elements? If so, how should they be interpreted?

Does XTM 1.0 disallow the use of XInclude? If XInclude is used, how should XTM 1.0 implementations react? And what about entities?

topicRef

What happens when topicRef refers to something that is not a topic element? When it is a topic element in a different topic map document? Does it make any difference whether the topic map that contains the referenced topic has been merged in with mergeMap or not?

URI references

What fragment identifiers must be supported? ID references? Is support for XPointer required for conformance? How are XPointer spans handled? Are there any differences in what is required for topicRef, mergeMap, subjectRef and resourceRef?

Does XTM require support for any specific URI schemes? Is this left to the implementation or is there a particular subset that must be supported?

Should the DTD define a uri parameter entity in order to make the data type of xlink:href elements clear?

What should implementations do when the contents of the xlink:href attribute is not a syntactically valid URI?

Should XTM implementations perform any kind of normalization or canonicalization of URIs? For example, should the % escape syntax be parsed? (This subject recurs in the discussion of subjectRef element below.)

Is it an error for URIs in an XTM document to point to non-existent resources? If so, how should implementations react? When should they detect the non-existence of the resources? How should temporary failures be handled?

subjectRef

We assume that subjectRef elements replace the identity attribute of ISO 13250. This raises the question of the significance of the referring URI in itself and implies that it be considered a symbolic identifier for the topic. This further implies that the URIs used in subjectRefs are used as a basis for merging topics. This raises the question of when two topics are considered to reify the same subject and thus cause XTM 1.0 implementations to merge them. Some alternatives:

when the subjectRef URIs are the same string?
when they (upon inspection of the resource) turn out to refer to identical resources?
when they are lexically equivalent under the rules for lexical equivalence for URIs? For example, would http://www.ontopia.net/ and http://www.ontopia.net:80/ be considered equivalent?

Should implementations be required to or encouraged to make the resource referred to available to applications?

referent=isSubject

The conceptual model implies that there can only be one element for each topic with this value for referent. In our opinion this makes perfect sense. If this is the case the syntax specification should make this clear and it is an argument for using different element types for these two subjectRef variants, since that allows the DTD to constrain them in the intended way.

baseName

Why the name change from topname to baseName? And why have both baseName and baseNameString? This seems needlessly ugly. Why not simply rename baseName to name and baseNameString to baseName? Especially since baseNames can contain variants, making them seem rather different from the ISO 13250 basename.

variant

The parameter element is confusing. Why not call it scope, which is what it seems to be? Also, what are the rules for combining the scope defined on the parent baseName with this scope? And how are the scopes of nested variants computed?

What is the purpose of allowing nested variant elements? In our opinion this complicates the model to no gain whatsoever. Ontopia's implementation experiences indicate that the naming model of topic maps is more than complex enough already, and in ways that have performance and scalability impact on implementations. Finding the name of topics is an operation performed very often, and for this reason it is important not to make it too complex.

Should variant have instanceOf? This would seem natural since variant is so generic and seem like only slightly distinguished occurrences.

XTM 1.0 must specify how to map ISO 13250 name structures into the new model.

variantName

Why does the content model have +? This again seems like a needless complication that is likely to have performance and scalability impact on topic map implementations. We feel very strongly that this should be removed.

The significance of the various data types of resources referred to as variantNames need to be spelled out. Does XTM require any specific data types to be supported? How should implementations determine the data types of the resources?

How should implementations react if a resourceRef refers to something in a topic map document?

resourceData

We thoroughly approve of the general idea, since they make the awkward use of data URLs unnecessary. However, the current version seems to lose something data URLs had: the ability to specify the notation and data type of the resource data. Should resourceData elements support base64-encoded content? Other content transformations? Should they provide information about the notation/data type of their contents? Our opinion is that they should.

occurrence

Is it really correct to have an instanceOf child of this element? Should it not rather be role, since it describes the role played by the resource relative to the topic more than it could be said to refer to the class of which the resource is an instance?

We acknowledge that the element could be said to contain a reference to the class of which the occurrence as an association between the topic and the resource could be said to be an instance, but feel that this is subtle and difficult to understand, and thus that role might be a better name for this.

member

In revision 1.4 member has a scope child, which seems incorrect to us. Should this rather be on the association element?

Why the name member? In our opinion association role is a very good name for this, as the element represents the role played in the association by the topic. The participating topic could be said to be a member of the association, but the relationship represented by the containing element seems to us more like a situation/role-player relationship than a set/member relationship.

Why allow multiple topicRef children of the member? Again this seems like an unnecessary complication, where the model would be just as well served by having several members each representing a single topic. Our experience with implementing various kinds of systems using topic maps is that the structure of associations has a significant impact on the performance of such systems, and particularly query engines. To simplify the structure of associations in the manner suggested here would make the job of query engines significantly much easier.

If multiple topicRef children are allowed in member elements, are multiple member elements that are instances of the same class allowed? If they are allowed, should the resulting topic map graph hold a structure that has somehow been normalized?

scope/addToScopes

Do these elements hold a set of topics or a bag of topics? If a set, how should implementations react to instances containing duplicate references?

Data model

In this section we have put comments that we feel relate to the topic map groves. Some of them may impact the conceptual model, but whether they do or not depends on the perceived purpose of the conceptual model. In our opinion all these issues must be addressed, but it is unclear whether the conceptual model is the right vehicle for doing so.

In the syntax, all elements have id attributes, and on topic it is required. Are any of these IDs part of the conceptual/data model? It would seem natural for the IDs on topics to be. If IDs are part of the data model, what happens when two topic map documents that use the same ID are merged?

What is the character set of abstract topic maps? What characters are allowed in topic map names, and especially topic IDs?

TopicMap

Should the TopicMap class have a base URI property?

topicmaps-comment message

Ontopia's comments on XTM 0.9

Table of contents

Abstract