tm-pubsubj-comment message

Subject: [tm-pubsubj-comment] human actors and machine agents

From: "Bandholtz, Thomas" <Thomas.Bandholtz@koeln.sema.slb.com>
To: "'tm-pubsubj-comment@lists.oasis-open.org'"<tm-pubsubj-comment@lists.oasis-open.org>
Date: Sat, 18 May 2002 18:13:21 +0100

Title: [tm-pubsubj-comment] human actors and machine agents

This is somehow similar to "doc vs. data", but a different view.

I see human actors who
a) are topic map editors
b) want to read and understand topic maps (or fragments of them)
These people need something "human readable"

I see ("intelligent") machine agents that search topic maps, auto-classify information resources, find occurrences of topics in documents, etc.

Machine agents need something "machine readable", and one of their features should be to convert the "machine readable" information they have found to something "human readable".

Humans generally do not need a formal syntax to understand something, but machines prefer a formal syntax, as they try very hard to understand unstructered text.

Humans seem to be somehow more intelligent :-) So I would guess even Humans will be able to read and understand something that has been written in a formal syntax. That's why I think that "machine-readable" and "human-readable" could be the same resource generally.

It depends (among others) on the readability of the syntax, and I think XTM is not as readable as a formal syntax for topic maps could be.

I would like to ask everybody:
*** When you try to read the XTM samples around, have you any problems to understand? If so, where do these problems start?

I have problems. One problem is that I find it very hard to sort typology from topics intuitively, as anything has the form of a topic in XTM. Secondly, there is too much tagging between the information. When we use attributes instead of elements wherever possible, everything would become much more compact and more human readable.

> * Lars Marius Garshol
> |
> | Instead, we could define conventions for how HTML/XHTML documents
> | can refer to the XTM/RDF documents that contain the published
> | subject assertions.
>
> * Thomas Bandholtz
> |
> | This will not work for a machine. There must be a clear formal
> | syntax.
>
> There could be. We could define a value for the 'rel' attribute of the
> HTML <link> element that would let HTML documents link to the formal
> syntax in a clearly defined way.

I do not regard HTML documents as (semantically) structured text, so what finally is in the HTML should not be of any importance for the "intelligent" machine agent.

Sample: a machine can enter a HTML document using a fragment identifier, just like a human. While a human generally can understand where the referred document fragment ends (simply by reading and understanding the text), a machine cannot, as the end of the fragment is not coded using a formal syntax.

> You should also be a little clearer on what you mean: are you talking
> about how to retrieve the <topic> with assertions about a published
> subject, or about how to retrieve the published subject indicator?
> These cases are not at all equivalent.

How to retrieve a published subject indicator? This is a hyperlink. I am only discussing what should happen when you use this hyperlink to understand what the published subject is about, well "retrieve the <topic> with assertions about a published subject".

> * Bernard Vatant
> |
> | I don't figure what you mean by "when the PSI set moves".

I said "when the PSI set moves from data to doc or vice versa". The URLs must remain persistant when the media change. This should be a basic requirement. Remember, *cool URIs don't change*, this is one of the 10 commandments of the Internet, not only of PSI. I want my PSI to be cool. That's why I see a problem about fragment identifiers (#) vs. query (?).

> | One
> | fundamental and necessary feature of a PSI set is indeed *not to
> | move*. Either your PSI set is doc-based, either it's data-based. Of
> | course you will need different applications to use the former or the
> | latter. That's why the PS Doc should include a description of the
> | application type needed to use it.
>
> I don't see what the application type has to do with it. The nature of
> the published subjects (number, frequency of changes, etc) may affect
> how you choose to publish them, but I don't see how the type of using
> application matters.

PSI should be completely unconcerned about the "application -" or media type used. You use a URI, and this resolves to a well structured piece of information. It should not matter at all whether this information is part of a doc or queried from a database. The result should be exactly the same.

(Ok, here we have some problems, such as: identified HTML fragments have no formal end but the end of the whole document, so the first anchor in a document includes *all* fragments that follow - remember: anchors may be nested, so the next anchor needs not to be the end of the current fragment. And: the fragment may end semantically before a next anchor occurs).

> | If you want both applications to be able to deal for your subjects
> | set, then you'll have to create both doc-based and data-based PSI
> | sets, and maybe declare equivalence in each Subject Indicator, e.g.,
> | the resource you retrieve at #myTopic somehow declares that it
> | represents the same subject than ?id=myTopic.
>
> What's the point of this? Why can't you just have a single URI? I
> don't understand this. Having multiple subject identifiers for a
> single subject is the Wrong Thing.

Completely the Wrong Thing. When PSI are completely unconcerned about the "application -" or media type used, we have no problem here.

But media types will change, while cool URIs don't.

I am sure that the most relevant folks that I want to use topic maps (and PSI) never will use docs because they have millions of PS (The Alexandria Digital Library Gazetteer contains 4,9 million subjects to be published, see

http://fat-albert.alexandria.ucsb.edu:8827/gazetteer/. Of course, this is a database application, but not a PSD currently).

The computers community (those are humans, hopefully) has learned to use databases instead of files when large sets of information have to be handled. This was back in the 80s of the bygone millenium.

You think this is obsolete now?

But when some newcomer starts a new set, he might prefer documents, as he might not have any database server available, and his subjects are not that many in the beginning. That's completely OK. I do not want to descriminate anybody. But if this guy (or girl, sorry, Mary and others) prooves to be successful, the set will grow. There will be a day when she says: "O God, I cannot handle this huge document any longer!" As she has been succesful, probably there will be some help by database professionals to overcome this.

Well, to come back to the point: a third time (sometimes redundancy is the better didactics)

* PSI should be completely unconcerned about the "application -" or media type used. *

Think about the collaboration of human actors and machine agents, as some kind of value chain.

Thomas Bandholtz
CM / KM Division Manager; XML Network Moderator
Competence Center Content Management
SchlumbergerSema
http://www.schlumbergersema.com

Kaltenbornweg 3
D50679 Köln / Cologne
Germany
+49 221 8299 264

Follow-Ups:
- Re: [tm-pubsubj-comment] human actors and machine agents
  - From: Lars Marius Garshol <larsga@garshol.priv.no>