[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]
Subject: [xtm-wg] lazy processing vs. extended XLink and BOS processing
Peter -- You seem to be saying that it's OK for incredibly time-consuming and bandwidth-consuming searches to be performed by web crawlers, but it's not OK for very similar processing to be performed by topic map engines as they assemble lookup tables for addressed nodes in the hyperdocuments that they, together with the resources that contain their topic occurrences, constitute. There is a huge functional difference between (a) the indexes that a traditional web crawler assembles and (b) the lookup table for addressed nodes in a hyperdocument, but the expense of making and persisting the latter is only marginally greater. If you're willing to incur the cost of web crawling, why would you object to making really useful hyperdocument lookup tables while you're at it, so that all the inverse relationships are also available? It's those inverse relationships that make topic maps able to offer something radically better than what's already available. Without that radical improvement, I fail to see the point of our whole XTM Specification effort. [Steve Newcomb:] > To make a selection from some set, you must first obtain the set > from which you want to select. [Peter Jones:] > [PPJ] Can't I just know the type/properties of the set and some > location within it. Less overhead. It's not really so much less overhead, compared with the cost of web crawling. You have to do a web crawl to get the type/property/location information you're talking about. The overhead involved in doing that is enormous. Why not make it really count? If you have to expend all the processing and bandwidth necessary to obtain, by web crawling, the limited set of information you seem to want to be satisfied with, it's wasteful to fail to remember also what is addressed by what, and in what context(s). This "inverse relationship" or "where referenced" information is essential to understanding topic map documents. The approach you seem to be espousing, if I understand it, is that we should be able to use existing web crawler technology as finding aids to find topic links within topic maps that appear on the web. In your scenario (if I understand it), web crawlers should discover topic map documents on the web, and then they should be able to take people directly to those topic links in those topic map documents, and then the people taken to those topic links should be able to use the topic links to get wherever they're going. Some of the problems with this approach are: * Such users won't know about the associations in which such topics participate, so the bulk of the usefulness of the topic map will not be available to them. * Topics that don't happen to use your query keyword as a topic name won't be found, even if they're exactly the right topic. * Scopes won't have any utility for information hiding, because the application won't know the scopes, much less anything about the topics that appear in the scopes. You'll get many irrelevant hits. Infoglut redux. * Public topics won't be useful hubs from which you can get to topic maps that contain topics with the same subject identity. In short, the approach you propose (if I understand it) would turn topic maps into things that are not materially different, in their functionality, from ordinary HTML documents in which every link always takes you from where it is to somewhere else. If that's all you want, I suggest maybe we should create an XML DTD for documents that only do what you're interested in doing. But please be informed that I, for one, will implacably oppose calling such an architecture a "topic maps" architecture, because it won't support topic maps. None of the primary design goals of the topic maps paradigm will be met by it. > [PPJ] In the situation I am describing (I will have another stab at > improving the communication of this in a forthcoming mail) I > envisage the creation of the BOS as something that takes place after > the retrieval (see also later comments about scope and integrity in > this mail). As I don't yet have adequate understanding of HyTime > yet I can't judge how flexible a BOS is. How open to revision at > run-time is it? It's as open to revision as we want it to be. Indeed, the BOS is whatever we want it to be; it's declared (or not declared) however we want to declare (or not declare) it. The concept of BOS is inescapable, however. The BOS is whatever the application in fact decides that it is; if the application solicits user input on the question of what to regard as being "in the BOS", then the BOS is whatever the user decides it is, whenever the user makes that decision. The BOS is the de-facto perimeter beyond which the application does not know what's doing any addressing. It's axiomatic that all applications are limited in this way. The BOS of an HTML browser, for example, is the currently-displayed HTML document, full stop. I say this because HTML browsers do not know what, in the currently displayed document, is being addressed by links in other documents. There is a one-to-one correspondence between BOSs and hyperdocument lookup tables (in HyTime parlance, such tables are called "hyperdocument groves"). It is not a problem for a single resource to participate in any number of hyperdocument lookup tables (i.e., to be regarded as a member of any number of bounded object sets (BOSs)). There is no reason (other than tradition) why a web crawler can't produce lots of hyperdocument lookup tables as a side-effect of its web crawling activities. [Steve Newcomb:] > You also seem to be suggesting that we standardize some algorithm for > selecting from a topic map only those constructs that are relevant to > some set of resources. I question the general utility/advisability of > this idea. To make a selection from a topic map may "edit it to > death" -- e.g. by invalidating the scopes that contain themes that are > no longer present in the "selected" version. [Peter Jones:] > [PPJ] If the addthms are always required to be at the top of an XTM > doc a suitable compromise can be reached(?). Sorry, I don't see the connection between your point and mine. What does addthms have to do with it? Themes can be added (and they *must* be addable) from within other topic map documents. If we disallow this, the merging of read-only topic maps becomes insupportable. Specifically, it becomes impossible for enterprising persons to provide topic map products that serve to merge (and thereby add value to) the topic maps of other enterprising persons. > Even if the selected portion(s) of the topic map do not require any > other parts of the topic map in order to have integrity, the topic > map author's conception of the structure of knowledge will still be > seriously affected; it's just not the same topic map any more. > [PPJ] Yes. It isn't. (Echoes of Roland Barthes on the "Death of the > Author). But does that completely kill its utility? I'd say, "Yes, it completely kills the marginal utility of topic maps over vanilla HTML documents." How not? I think you can already do everything you seem to want to do with plain HTML, or with XML and "simple" XLink, which is not materially different in its linking functionality from HTML's <a href="..."> link. [Steve Newcomb:] > I'd be happier to leave this whole question (i.e., the question of > how topic maps can be made from other topic maps, and of how topic > maps should be presented in particular contexts) to applications. [Peter Jones:] > [PPJ] It might be smart to specify some sort of default association > that TM processors must implement something to the effect that in > the absence of any defined associations connecting up topics found > in the BOS, these will be automatically attached to a > 'DefaultAssoc_MemberOfThisDocForNow' type assoc. Leaving aside the question of the purpose or advisability of providing such a default association, how would a TM processor know whether or not a topic link was addressed by any association links, without actually processing all such links? You seem to be saying that you would prefer that topic maps provide their own hyperdocument lookup tables syntactically, internally, and redundantly. There are serious problems with that idea, including: * We have to wait until the XML InfoSet committee completes its work, so there is a formal expression of that model (perhaps as an ISO property set). In the absence of this work, there is no Recommended way to express addresses, because there's no Recommendation regarding exactly what constitutes an addressable node in XML. * Constant effort will be required to maintain each topic map document, in order to keep up with changes in the mapped resources. * We should discard the existing 13250 syntax; there is no point in using it because it's mostly (and it could be made entirely) redundant, given the information in our syntactic representation of the hyperdocument lookup table. (BTW, there is already an ISO standard DTD for representing such tables, among many others. It's called the "Canonical Grove Representation DTD.") [Steve Newcomb:] > Either the anchors are known or they are not known. That means that > either you have processed the whole bounded object set, or you have > not. You can't make this computation lazily. If you have not made > the computation up front, you have no way of knowing, when you're > looking at something, what may be linked to it. [Peter Jones:] > [PPJ] I don't agree that on the WWWeb things like this cannot be > done lazily. It would seem to me that in the arena of publicly > available topic maps on the web it is more like a necessity that we > be able to do this. [Steve Newcomb:] > Either you have processed the pre-existing TM, or you have not. It > can't be dug out piece by piece, unless the overhead of digging out a > piece is equal to the overhead of processing the entire TM. > Within the topic map document itself, we can't know what > associations a topic participates in without reading and resolving > *all* of the association links. > [PPJ] See comments about laziness above. I see no problem with > iterations with a set crawl depth. A crawl depth has nothing to do with the question of whether you must process all the association links in a given topic map document. "Set crawl depth" is an example of a way of specifying a BOS. In fact, that's the simplest way of specifying a BOS using the HyTime syntax for specifying BOSs. It's called "boslevel". [Steve Newcomb:] > Within the set of resources mapped by a topic map (the bounded > object set (BOS) that includes those resources as well as the topic > map document itself), we can't know which parts of which resources > are regarded as occurrences without reading and resolving *all* of > the topic links. [Peter Jones:] > [PPJ] If we are assuming that the BOS is something that is indicated > in a root doc, and that the only access to the contributing docs is > via that root doc, then I think we are making some grossly > unrealistic assumptions about the way access to publicly accessible > topic maps docs can be controlled. I've heard the "grossly unrealistic" charge many times; I categorically deny it. True, it's grossly unrealistic for anyone who believes that extended XLink is grossly unrealistic. Many web people evidently believe that extended XLink is grossly unrealistic. Existing technology (e.g., X2X, GroveMinder) makes extended XLink pretty gosh-darn realistic-looking. > Think about the way exisiting Search engines on the web just index > the whole shebang and let you dive in at any doc that's indexed. ...and just think about all the irrelevant hits you get with today's search engines. Infoglut is one of the most important problems that the topic maps paradigm was designed to solve. I believe you're proposing to unsolve the problem, here, in the name of lazy processing. I've always thought that computers were supposed to improve the productivity of humans, not the reverse. [Steve Newcomb:] > I realize that some exceptionally simple topic map applications may > only need to provide traversal service from the map to the > occurrences, and not from the occurrences to the map. This is like > the WWW model of <a href="..."> links, in which you can go to the > other anchor, but you can't start from the other anchor. However, > this simplifying assumption, if generally applied to topic maps, would > utterly destroy the significance of the phrase "topic map"; it would > be a misappropriation of the "map" metaphor. A topic map based on > this simplifying assumption would be like road map that wouldn't let > one find and use any appropriate road near wherever one actually was, > in order eventually to get to wherever one wanted to go; all roads > would lead one in the wrong direction -- away from the topic links -- > and they could only be entered at the topic links. If one must first > be at a topic link in order to get anywhere else, it becomes literally > true that "one can't get there from here", no matter where "here" is, > unless "here" happens to be some topic link within the topic map > document. I therefore claim that "lazy" processing of links and > anchors is incompatible with the whole idea of topic maps. [Peter Jones:] > [PPJ] But you could employ something like C++ compilers use when > they give all those 'unresolved external symbol' errors from the > lookup table (assuming you've got a resource missing). It primes the > app to return to the location it got the TM segment from to go back > an look for more if necessary. You can't generate a report about missing inverse relationships (the answers to the question, "Who addresses me?") unless you know about them, and if you know about them, they're not missing. I repeat: lazy processing of links and their anchors is incompatible with the whole idea of topic maps. The topic maps paradigm absolutely *requires* extended XLinks. Simple XLinks simply won't cut it. [Steve Newcomb:] > Yes, > pre-processing of bounded object sets (BOSs) is expensive. Yes, the > Topic Maps paradigm is not supportable using existing commonplace > Web-centric applications and processing conventions. [Peter Jones:] > [PPJ] Hmm. I sense sponsorship ebbing away in a matter of femtoseconds. Hmmm. I sense huge commercial opportunities for people who can offer the next generation of extended-XLink-aware web technologies that can support web crawlers that create and maintain hyperdocument lookup tables for inverse relationships within arbitrary bounded object sets. I sense huge commercial opportunities for online information services. I sense huge commercial opportunities for all kinds of businesses, small and large. I sense big changes coming. As usual, there will be winners and losers. Let's remember that sponsorship is not required for the technical work to continue to completion. True, Topicmaps.org's marketing budget is not likely to be funded by people whose goals are limited to selling current search technologies that cannot support the topic maps paradigm. Speaking only for myself, I, for one, am completely comfortable with that. If you're not comfortable with that, Peter, now would be a good time to raise the issue. -Steve -- Steven R. Newcomb, President, TechnoTeacher, Inc. srn@techno.com http://www.techno.com ftp.techno.com voice: +1 972 359 8160 fax +1 972 359 0270 405 Flagler Court Allen, Texas 75013-2821 USA ------------------------------------------------------------------------ Replace complicated scripts using 14 new HTML tags that work in current browsers. Form the Web today - visit: http://click.egroups.com/1/5769/4/_/337252/_/964034441/ ------------------------------------------------------------------------ To Post a message, send it to: xtm-wg@eGroups.com To Unsubscribe, send a blank message to: xtm-wg-unsubscribe@eGroups.com
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]
Powered by eList eXpress LLC