[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]
Subject: Re: [xtm-wg] The Topic Naming Constraint
[Geir Ove Gronmo:] > First of all I'd like to say that I wholeheartedly agree with Lars > Marius on this matter. The Topic Naming Constraint is [extremely] > painful and too awkward - in my eyes it is a real show-stopper. > I'll try to explain what I think is the problem with draconian Topic > Naming Constraint enforcement, which the standard(s) at this point > requires from an XTM processor. A conformant XTM processor has to > enforce these constraint, e.g. there cannot be _any_ exceptions if > you'd like to claim that your processor is conformant. > I see the usefulness of being able to do automatic namespace-based > merges. This is extremely useful, but unfortunately the problem is > that people (myself included) believe that base names are intended for > labelling purposes, not for identification. I'm baffled by your argument, Geir Ove. You seem to be saying that we should do away with XTM's ability to support string-match-based topic merging because you don't agree that that is the purpose of baseName elements. Would you be happier if we re-named the <baseNameString> element type to <string-for-string-based-topic-merging-and-for-no-other-purpose>? If we had not already decided on the interchange syntax for XTM, I could live with that. (You'll have to forgive me if I just so happen to put the names of my topics in <string-for-string-based-topic-merging-and-for-no-other-purpose> elements.) If you want to do away with the topic naming constraint, please don't confuse what's left with the topic map paradigm, and don't imagine that what's left will allow topics to be addressed by their names. > Applications need to be able to display labels for topics. The > obvious way to do this is to use names. Not true. A name is useful for addressing. A label is not useful for addressing; it is merely a display convenience. Contrary to what you seem to believe, labels should not always also be names. > Occurrences are usually out of the > question, since _generic_ applications wouldn't be able to know when > to use a basename and when to display occurrences. Not true. The processing of XTM <topicMap> elements is exactly what the XTM Spec says it is, no more and no less. If the XTM Spec defines a published subject that is "label", then that's all that's necessary. If an occurrence type is "label" (or any other XTM-defined occurrence type), generic applications will be required to know what that means, and to act accordingly. A label is a kind of occurrence, no more and no less. > I believe that it is a problem [actually a bug] that base names are > subject for namespace-merges. As I said, I believe that > namespace-merges are _extremely_ useful. But we should not use base > names for this. > > Proposal: > > o We need another kind of name, e.g. identifying-name. (I don't > think that it makes sense for an identifying name to have variants > like basenames do.) You're proposing to redefine the meaning of the <baseName> generic identifier, simply because you think it ought to be called something else? I hope and believe it's too late for that; I think we've already put the syntax of XTM to bed. You're also proposing to blur the distinction between names and labels, so that you can have a <name-that-is-not-really-a-name> and a <name-that-is-really-a-name>. I think you're swimming upstream, here. The vocabulary of informatics is already well-established, especially in the context of the Web, where the word "name" has semi-religious significance in which a name is an address, or at least part of an address within some larger context (namespace). > A name targeted towards namespace-based merges complements subject > indicators. Instead of just identifying a topic by subject > indicators (the _meaning_ of the resource content) you're then also > able to identify a topic by the name resource _content_ > (byte-by-byte). Right. But what's wrong with calling the element type that contains the string to be matched <baseNameString>? It's consistent with years of popular usage. > Here are some thoughts on why the TNC doesn't work in real-life in its > current incarnation. > > Problem: > > Topics are merged even though the author(s) didn't intend them to > merge because it is known by the authors that they have different > subjects. This is what I believe is the main problem with the TNC in > real life. That can only happen if the authors don't know what <baseName> means. I guess you're saying that authors don't know what <baseName> means. Whose fault is that? It's not the fault of the XTM Spec, nor of ISO 13250, both of which are crystal-clear about this. > I see at least four ways that a processor can behave when merging two > or more topic maps: > > o The merge happens automatically and parts of the topic map no > longer makes sense (is inconsistent), because resulting topics > now represents more than one subject. > > o The merge is interactive. Unfortunately this requires the > author(s) to be present when the merge happens. This is > unacceptable in most cases. You cannot expect the authors to be > present when a merge happens. Note that there are many [an > unlimited number of] reasons why a merge happens. > > o The processor marks the topics as to-be-merged. This prevents the > merged topic map to be presented to the user[!]. (A non-consistent > topic map cannot be presented to the user by a conforming > processor.) > > o The processor doesn't do merging based on the TNC. This is not > allowed by the standard. There is something fundamentally amiss in your underlying assumptions here. You seem to be saying that merging happens without somebody taking responsibility for the merge. Neither 13250 nor the XTM Spec says or implies any such thing as automatic unattended merging of arbitrary sets of topic maps. Both of the standards are strictly limited to saying how a *single* topic map document (or, in the case of XTM, a single <topicMap> element) should be interpreted. So, if you want to declare, in an XTM-conforming fashion, that two topic maps must be merged, the *only* way you can do that is to write a third <topicMap> that contains a <mergeMap> for the one, and another <mergeMap> for the other. Please note: when you write this third topic map, you are, by definition, a topic map author, you are creating a topic map, and you are responsible for the sensibility of what it says. Neither 13250 nor XTM say anything about the methodologies whereby topic maps should be (or can be) created. Both standards only say what a topic map must be interpreted to mean after it has been created. In light of these facts, let's discuss your points one by one: > I see at least four ways that a processor can behave when merging two > or more topic maps: There is only one way an XTM-conforming processor can behave when merging topic maps. Otherwise, there is no point in having an XTM Specification. > o The merge happens automatically and parts of the topic map no > longer makes sense (is inconsistent), because resulting topics > now represents more than one subject. If this happens, it is the fault of the person who wrote the topic map that caused the inappropriate merging to occur. It is always the responsibility of the topic map author to write a topic map that makes sense. (There is an unbounded number of ways in which to create a nonsensical topic map, and there is no way for the Spec to prevent that. Indeed, the power to make sense always includes the power to make nonsense.) > o The merge is interactive. Unfortunately this requires the > author(s) to be present when the merge happens. This is > unacceptable in most cases. You cannot expect the authors to be > present when a merge happens. Note that there are many [an > unlimited number of] reasons why a merge happens. A topic map always has an author. It is the author's responsibility that it makes sense. If the authoring process includes some interactive procedure, that's just fine, but the Spec does not specify any such interactive procedure. Vendors like Ontopia can invent and implement such interactive procedures, of course, and I hope they will! But the person engaging in such an interactive procedure is, by definition, a topic map author. Also, contrary to what you say, according to the XTM Spec, there is an extremely small and finite number of reasons why merging occurs: (a) If by "merge" you mean "the merging of topic maps", there is only one reason: the existence of a <mergeMap> element in the <topicMap> element being processed. (b) If by "merge" you mean "the merging of topics", there are only two reasons: (1) The topics have the same name in the same topic namespace, and/or (2) The topics share one or more subject identity points. > o The processor marks the topics as to-be-merged. This prevents the > merged topic map to be presented to the user[!]. (A non-consistent > topic map cannot be presented to the user by a conforming > processor.) Here, you're thinking about the problems of implementing some software. The Spec does not forbid you to write any software. You are free to write and license any software you like. If, among the other features of such software, the software can by used to fully understand XTM <topicMap> elements in the manner set forth in the XTM Spec, then that feature is XTM-conforming. The fundamental purpose of the Spec is to describe a limited, implementable set of functionalities that all XTM-conforming software must implement, with respect to XTM-conforming <topicMap> elements. It is emphatically *not* the purpose of the Spec to limit the set of functionalities that any software *may* implement. There is no limit in that regard at all. There is certainly no limit with respect to what software may do with information that does *not* happen to be XTM-conforming <topicMap> elements. > o The processor doesn't do merging based on the TNC. This is not > allowed by the standard. You're oversimplifying the case. Let's be very clear about this. The processor must do everything the Spec requires, including support for the topic naming constraint, if and only if *all* of the following things are true: (1) the processor claims to have the ability to support XTM-conformant processing, and (2) the processor is being used in its XTM-conformant mode by its user (because it may, of course, have other modes of operation that are not claimed to be XTM-conforming and which are therefore not constrained by the Spec), and (3) the processor is processing a <topicMap> element that claims to be XTM-conformant. > - - - > > It has been pointed out that one of the reason why the base name > constraint exist is to avoid ambiguities when presented with identical > names. I agree with the usefulness of being able to avoid ambiguities. > > Something to be aware of is that a name can be disambiguated by a > processor even without looking at the name scope: > > o A basename belongs to a topic, which itself represents a > subject. The subject and subject descriptors _disambiguates_ the > name! What you're saying here is that if we allow several topics to have the same name in the same namespace, there's no problem because we can just go look at all the topics that have the same name in the same namespace to see which one is the one we're looking for. Of course, what you're saying is true. On the other hand, this methodology offers nothing we don't already have in typical search engines. Your proposal is tantamount to proposing that users, rather than computers, should be required to sort through the infoglut on their own behalf. > o The type-hierarchy and the classes of which the topic is an > instance describes what the topic is about and that should to some > extent disambiguate the name. This is the same specious argument you've already made; you're just pointing out some ways that, after we have established a list of topics to look at, we users can distinguish between them. It's still a "back to infoglut" argument. I'm not buying it. > o Basically all the other characteristics can be used by the XTM > processor to further disambiguate a name to the end-user. No, in this scenario, the XTM processor isn't doing the disambiguation. The end-user is doing the disambiguation, based on what the XTM processor is able to tell the user. It's infoglut all over again, only worse, because now the user has to establish the relative importance of each aspect of everything when deciding what to look at. Your proposal would compromise the ability of the topic map paradigm to enhance the productivity of humanity, by taking away from the topic map author the ability to precalculate the relevance of materials on behalf of the end user, in a way that the end-user can simply rely on and use, without having to understand it. My own proposal is radically different: I propose that we use the topic map paradigm as it was designed to be used, as a solid platform that allows domain experts to objectify their expertise in ways that can maximally enhance the productivity of non-domain-experts. > - - - > > Why is the TNC awkward? > > 1. It is impossible to universally scope basenames at the time of > authoring to avoid unintended merges to happen in the future. Simply not true. When authoring, you can scope any <baseName> any way you want. There is no such thing as an "unintended" merge. As an author, you and you alone decide which topics will be merged, and which will not be merged, and whether the merged topics will be merged on the basis of common identity points, common names in namespaces, or both. > You cannot know at a given point in time that you'll never have > unintended merges caused by the Topic Naming Constraint. Who is the "you" in the above sentence? If "you" are authoring a topic map, "you" are responsible for the merging that occurs, because "you" specified the topic map in such a way that merging occurs. This isn't "awkward"; it's essential that you say what you mean to say, and that you can rely on the fact that everyone who uses the topic maps that you've created will understand them to mean exactly what you said that they meant, when they are interpreted as the Spec requires them to be interpreted. > 2. Most merges will be done automatically by a computer (without user > intervention). > > You cannot expect the authors of the two topic maps to be present > the merge happens. There is confusion here, because "merging" means two different things, and the distinction between them has been blurred. (1) "Automatic merging of topics." This is the merging of topics that is required to occur during conforming processing of a *single* <topicMap> element, which may or may not contain <mergeMap>s. (2) "The authoring of a <topicMap> that may or may not contain <mergeMap>s." This is topic map authoring. When the author decides to publish his <topicMap> element, he is taking responsibility for the fact that, when the <topicMap> element is processed by an XTM-conforming processor, exactly and only the merging that he intends and believes to be appropriate and correct will occur. > 3. A computer cannot automatically correctly and sensibly scope names > to avoid the TNC. Right. I agree with you. In general, computers are still lousy authors. > - - - > > Conclusion: > > o Get rid of the TNC and introduce a separate content-based > identifying name. This is a terrible idea. I'd put it in the same category as, "We don't want to bother with topic maps, because Microsoft Help, with its powerful full-text searching capabilities, already does everything anybody really needs." (I mention this particular howler because a real customer has actually said this me.) I've just re-read this note, and I've just realized that you seem to assume that any topic map should merge automatically with any other topic map, or perhaps that the paradigm is designed to make this possible. Just in case this is what you're thinking, let me assure you that this is nonsense. It is also untrue that the topic naming constraint exists in order to make it possible to merge arbitrary sets of topic maps automatically. In my own view, the most persuasive reason for the existence of the combination of the two merging rules (the Name-based and Subject-based merging rules) is to make it economically feasible for Party C to maintain a topic map that merges Party A's and Party B's topic maps, even though the latter pair of topic maps are evolving separately, in total ignorance of one another, and even though one or both of them is not rationally maintaining the syntactic addressibility of their XML element components. The ability to address topics rigorously, by means of their names, is critical to the economic feasibility of Party C's business model. Yes, Party C has to work hard to keep up with the changes to Party A's and Party B's topic maps (it can never be a fully automatic process) but at least Party C doesn't have to start from scratch every time Party A or Party B releases a new version of their topic maps. Party C can address Party A's topics and Party B's topics by their names, if desired, and/or by their subject identity points. Using the two kinds of addressing in combination is extremely powerful. The economic feasibility of Party C's business model is critical to the promulgation of global knowledge interchange. People like Party C have to be able to make a profit from integrating the knowledge of people like Party A and Party B, if the dream of global knowledge interchange is to be realized without having us all drown in global knowledge glut. -Steve -- Steven R. Newcomb, Consultant srn@coolheads.com voice: +1 972 359 8160 fax: +1 972 359 0270 405 Flagler Court Allen, Texas 75013-2821 USA ------------------------ Yahoo! Groups Sponsor ---------------------~-~> eGroups is now Yahoo! Groups Click here for more details http://us.click.yahoo.com/kWP7PD/pYNCAA/4ihDAA/2n6YlB/TM ---------------------------------------------------------------------_-> To Post a message, send it to: xtm-wg@eGroups.com To Unsubscribe, send a blank message to: xtm-wg-unsubscribe@eGroups.com Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]
Powered by eList eXpress LLC