topicmaps-comment message

Subject: [xtm-wg] Lars Marius' comments
From: Steve Pepper <pepper@ontopia.net>
To: xtm-wg@yahoogroups.com
Date: Sat, 10 Feb 2001 21:28:27 +0100
Lars Marius,

Thanks for your many comments. We try to address most of them here.

| Throughout the specification the emphasis placed on subject indicators
| and the computable identity of resources seems to indicate that it is
| the resources used as indicators that are the basis for merging, when
| it must in fact be the locators of the indicators.  This must be so
| because it is utterly unreasonable to expect processors to download
| all indicators and compare them to compute identity.  It might be
| useful if the emphasis were shifted onto the locators instead, to make
| this clearer.

F.5.2 describes subject-based merging in detail and makes it very clear
that it is the URI equality principle that determines whether two
resources are deemed to be the same.

| The terms 'address' and 'locator' are nowhere defined, by the way,
| even though they are absolutely central to the specification.

The definitions in XLink apply.

| 1.2 Goals
| =========
|
| Second para
|
|   There are also references to RFCs 2119 and 1738 in the
|   specification.  Section 4.1 should perhaps be moved into 1.3.

We haven't done this, but we have added all RFCs mentioned in
the spec to the References annex.

| 1.3 Terminology
| ===============
|
| Since this whole section is in any case repeated in a different order
| in section 2 I think it might as well be removed.  If anything should
| happen to not be defined in section 2 the appropriate text could be
| moved there.  Section 2 is far more readable, and so 1.3 does not seem
| to serve any purpose, especially given the frequent hyperlinks from
| the use of terms to their definitions.
|
| Removing it would
|
| - reduce bloat,
| - not change the meaning of the spec (unless there be bugs),
| - make it less forbidding to read (since the spec would start with a
|    gentle introduction, rather than a definition of terms in
|    non-conceptual order), and,
| - reduce the chances of internal contradictions.

While we sympathise with your reasons, even if we were to agree with
your conclusion, we feel it is too late to remove this section.

| consistent topic map
|
|   Defined as a topic map that has "one topic per subject and no
|   further opportunities for merging or duplicate suppression, as
|   defined in Annex F".
|
|   However, as is noted elsewhere, the rules of Annex F are not
|   sufficient to guarantee that there will be one topic per subject.
|   The definition should be weakened to allow for this, especially
|   since F.1 para 1 and 2.2.5.2 require processors to present a
|   consistent topic map to the application, something that the
|   processor under this definition cannot possibly be expected to
|   achieve.
|
|   The term is also used in B.9, but in what seems to be a weaker
|   form (no mention of duplicate suppression):
|
|   "It is possible for more than one Topic in the Topic Map to reify
|   the same Subject. If no Subject is reified by more than one Topic in
|   the Topic Map, then the Topic Map is said to be a Consistent Topic
|   Map."
|
|   I suggest that these two sentences simply be removed, as they do not
|   say anything new in any case.

The reference here is not to "one topic per subject" in the ideal sense,
but in the sense of having performed all required merging and removed
duplicates.

| member
|
|   Defined as "A topic that plays a role in an association.", which in
|   2.2.4.1 becomes "A member is a topic (or set of topics) that plays a
|   particular role in an association."  I believe both definitions
|   should define a member as a non-empty set of topics.

It is a requirement to be able to represent incomplete knowledge (I know
that Mary is married, but not who to). Allowing <member> elements that
do not address any topics is one aspect of this.

| role
|
|   The definitions here and in 2.2.4.2 are too brief to provide any
|   real understanding of what a role really is.  The syntax seems to
|   indicate that it is effectively specified by the role type, but this
|   is nowhere stated.

The descriptions of <roleSpec>'s subelements make this clear.

| subject identity
|
|   3. seems to not be a separate sense or meaning of the term, but
|   additional information about 2.

Maybe. But perhaps it doesn't do any harm, and it may aid understanding?

| topic characteristic
|
|   Defined as a topic name, occurrence or role, where topic name is
|   defined as a base name.  So what about variant names?
|
|   Suggestion: mention variants under the definition of topic name.

OK.

| topic map document
|
|   Defined as "A document that contains one or more topic maps that
|   conform to this specification. It may be serialized for the purpose
|   of storage or interchange in a syntax governed by this or some other
|   specification."
|
|   The first sentence seems to contradict the second.  According to the
|   first para of the abstract this specification defines a 'grammar',
|   which is a syntax. (It is certainly not a model!) Hence, something
|   that conforms to this specification must also conform to the syntax.
|
|   The term 'document' to me implies that it must be an XML or SGML
|   document, but this definition seems to imply that it can be a topic
|   map structure in a database or some completely different
|   serialization syntax. To me it would seem sensible to reserve some
|   other term for some of these uses, such as 'topic map graph', which
|   it would make sense for the object model/grove model to define.
|
|   I assume that this term is intended to cover both XTM and 13250, and
|   if so I suggest that it be abandoned in favour of 'XTM document' and
|   that 'topic map document' remain an informal term. There is nothing
|   formal to base it on in any case.
|
|   I suggest that 'XTM document' be defined as "An XML document
|   conforming to the syntax and other requirements in this
|   specification."

We stand by the belief that topic maps can be expressed in multiple
syntaxes (XTM, 13250, LTM, RDF, etc.), not all of which may even be XML
or SGML, and that there is therefore a need to distinguish between
"topic map documents" and "XTM documents". However, the point about the
introduction claiming that the spec (only) provides a grammar is
well-taken and changes have been made accordingly.

| topic map node
|
|   Uses the terms 'object' and 'the system' with no explanation of what
|   these are. Since this specification defines nothing but a grammar
|   the term does not seem to be very useful. My suggestion is that it
|   be left for the object model to define this.
|
|   If so, 2.2.5.1 should also be removed, and the use of the term in
|   2.2.5 para 1.

The specification defines more than just a grammar (see above). We agree
that more work needs to be done on the processing model -- and an object
model. A subgroup has been formed to do just that. Kal Ahmed, as the
chair of that subgroup in Paris, would be the person to contact to take
part in its work.

| unconstrained scope
|
|   A scope is a set of topics (as per the definition of scope), so what
|   are the contents of this set? When filtering characteristics by
|   scope in applications this will determine the outcome. The most
|   reasonable alternative in my view is that it is the empty set.

The exact nature of the unconstrained scope, and how applications should
interpret it, is not defined. It was our feeling that more work needs to
be done on this before casting it in stone. We agree that the empty set
is the most reasonable interpretation in many cases, but arguments have
also been made that (in some situations) it may be best characterized as
the set of all topics in the topic map.

| 2.2.1, para 1
|
|   "(also known as an topic types)"
|                   ^^

Thank you.

| 2.2.1.3, para 1
|
|   Could explain more clearly what happens when two topics are
|   discovered to have the same subject, and also how this is
|   established automatically. Something about what to do when it is not
|   automatically discovered would also be nice.

The former is covered in F.5.2. Not sure what you mean by the second
sentence: If it's not automatically discovered, it's surely an issue for
user-application interaction?

| 2.2.1.3, para 2
|
|   It can also be established through topic names, as per the TNC.

Not in the sense used here, which relates to conscious specification of
a topic's identity other than by naming (i.e. the use of
<subjectIdentity>).

| 2.2.1.4, para 1
|
|   "When two topics use the same resource to indicate their
|   subject...".  How does an application know that the two resources
|   are the same?

At a minimum by applying the URI equality principle, but other,
application-specific methods may also be used, including the use of
built-in dictionaries. See F.5.2.

|   "...and must therefore be merged."  This is a bit vague.  Who must
|   do this, how, and when?

OK. Clarification added.

| 2.2.1.6, para 2
|
|   "Scope is considered to establish a namespace for topics."  It might
|   be better to say 'topic base names' here, since the reader is
|   otherwise lead to wonder about what happens with occurrences and
|   roles.

OK.

|   "implicitly refer to the same subject and therefore should be
|   merged."  In 2.2.1.4 merging 'must' happen, while here it only
|   'should'.  Since this is a normative section, the term 'must' should
|   be used here as well.

2.2.1.4 is about subject-based merging. This is about name-based
merging. The wording here is less strong in order to allow applications
to provide alternate behaviour at the request of the issue.

| 2.2.2
|
|   Uses the term 'name' in several places where the formally defined
|   term 'topic name' is most likely meant.

Yes. We think this is fairly obvious. (Our goal is to achieve
communication. We believe we best do this through striking a balance
between 100% formal precision on the one hand, and readability on the
other. We realise this will not meet with the approval of some people,
but it is the decision we have taken.)

| 2.2.2.3
|
|   The relationship between parameters and scope is left undefined, and
|   the whole concept seems underspecified. What _is_ a processing
|   context, and of what use are parameters for choosing between them?
|
|   Also, is it allowable for an application to have a single 'context'
|   which is used as the basis for choosing both the appropriate base
|   name and the appropriate variant(s) of it?  Should one always choose
|   one variant or many?  An

The whole issue of scope processing is deliberately left undefined. It
is the best that can be done at this stage.

| Throughout the annex the undefined terms 'subject constituting
| resource' and 'subject indicating resource' are used. These seem to be
| the same as 'addressable subject' and 'subject indicator' in the spec
| proper, and so the annex should be made consistent with the spec
| proper.

All such usages have been expunged now.

| F.2.5
|
|   Should mention explicitly that variants are ignored and catered for
|   by F.6.2.
|
|
| F.2.6
|
|   Occurrences are equal if "the resource data values that are the
|   occurrences are equal [Note : equality of the resource data value is
|   determined by string equality.]".
|
|   This is a bug, since it means that all occurrences of all topics
|   that are being merged must be downloaded and compared for equality,
|   which is too costly to even contemplate. Furthermore, the string
|   equality principle as defined cannot be applied to binary data.

A misinterpretation, not a bug. The text has been modified to make this
clearer.

| F.3.3
|
|   This looks like an equality principle to me, and could be used to
|   simplify F.2.4.

This could be regarded as either equality or equivalence. We don't
think the difference is major in this case.

| F.4.2
|
|   This seems to be superfluous, since this constraint is already
|   defined by the DTD.

Right. It's gone now.

| F.5.1
|
|   Perhaps F.4 should be called 'Operations' and this section moved
|   there. F.5 could then be called 'Merge conditions'.

That would have been an alternative, but the change doesn't seem
necessary at this stage.

|   Also, point 2 of the error conditions section is a bug.  To check
|   this a processor will have to download every subject indicator,
|   parse it as an XML document, build a topic map from it, locate the
|   association inside it and do the comparison.  This is, again, too
|   costly to contemplate.
|
|   If this is not what is meant, this should be spelled out.
|
|   Also, there seems to be no good reason to treat associations
|   specially and not do the same for all other topic map constructs
|   that might be reified and for which equality rules are defined.
|
|   I think this should be removed.

So do we! Poof. Gone.

| F.6.3
|
|   What is the para after the postcondition doing there?

It sneaked in! Now it's gone.

Steve
--
Steve Pepper, Chief Technology Officer <pepper@ontopia.net>
Convenor, ISO/IEC JTC1/SC34/WG3  Editor, XTM (XML Topic Maps)
Ontopia AS, Maridalsveien 99B, N-0461 Oslo, Norway.
http://www.ontopia.net/  phone: +47-22805465  GSM: +47-90827246


------------------------ Yahoo! Groups Sponsor ---------------------~-~>
eGroups is now Yahoo! Groups
Click here for more details
http://click.egroups.com/1/11231/0/_/337252/_/981836978/
---------------------------------------------------------------------_->

To Post a message, send it to:   xtm-wg@eGroups.com

To Unsubscribe, send a blank message to: xtm-wg-unsubscribe@eGroups.com
References:
- [xtm-wg] Spec comments
  - From: Lars Marius Garshol <larsga@garshol.priv.no>