tm-pubsubj-comment message

Subject: Re: [tm-pubsubj-comment] human actors and machine agents
From: Lars Marius Garshol <larsga@garshol.priv.no>
To: "Bandholtz, Thomas" <Thomas.Bandholtz@koeln.sema.slb.com>
Date: Sun, 26 May 2002 15:02:07 +0200

* Thomas Bandholtz
| 
| It depends (among others) on the readability of the syntax, and I
| think XTM is not as readable as a formal syntax for topic maps could
| be.

I agree. That's why I created LTM. But XTM is not optimized for
humans, nor do I think that is important. 
 
| *** When you try to read the XTM samples around, have you any
| problems to understand?

No.
 
| I have problems. One problem is that I find it very hard to sort
| typology from topics intuitively, as anything has the form of a
| topic in XTM.

Then I think you need to train yourself, basically. I don't see that
"Thomas Bandholtz" is a subject, while "person" is not. You can
legitimately assign topic characteristics to both, and therefore it
makes sense that both can be topics in their own right.

Look at

  <URL: http://www.ontopia.net/omnigator/models/topic_complete.jsp?tm=i18n.ltm&id=script >

for example. This is the page for the topic "script", which is a topic
type. Don't you agree that the characteristics assigned to this topic
both make sense and are useful?

| Secondly, there is too much tagging between the information. When we
| use attributes instead of elements wherever possible, everything
| would become much more compact and more human readable.

That's true, but I don't think it is important. XML is inherently
awkward for this, which is why I chose a completely different
syntactical base for LTM. (Note that Robert Barta did the same with
his AsTMa, as did the RDF people with their n3.)
 
* Lars Marius Garshol
|
| There could be. We could define a value for the 'rel' attribute of
| the HTML <link> element that would let HTML documents link to the
| formal syntax in a clearly defined way.
 
* Thomas Bandholtz
|
| I do not regard HTML documents as (semantically) structured text, so
| what finally is in the HTML should not be of any importance for the
| "intelligent" machine agent.

Go back and read what I wrote. We would *not* put the assertions in
the HTML, we would put a link to them. It is not hard to make an agent
able to find that link and follow it to, say, an XTM resource.
 
* Lars Marius Garshol
|
| You should also be a little clearer on what you mean: are you
| talking about how to retrieve the <topic> with assertions about a
| published subject, or about how to retrieve the published subject
| indicator?  These cases are not at all equivalent.
 
* Thomas Bandholtz
|
| How to retrieve a published subject indicator? This is a hyperlink.
| I am only discussing what should happen when you use this hyperlink
| to understand what the published subject is about, well "retrieve
| the <topic> with assertions about a published subject".

This is where you would need a generic fragment creation algorithm,
which I think is very hard to do. I am also not at all convinced that
you need it.

If you come across a published subject identifier somewhere it should
be a in a topic map, in which case you should already have the
necessary topic characteristic assignments. If not, you, as a human,
can follow the link and find something useful.

If you think there is a useful use case to be served by this fragment
retrieval stuff, then please define it and let us consider it as a
possible step 4. I would *not* start there, however, as it would make
published subjects enormously much harder to publish and use, and it
would also limit them to topic maps.
  
| (Ok, here we have some problems, such as: identified HTML fragments
| have no formal end but the end of the whole document, so the first
| anchor in a document includes *all* fragments that follow -
| remember: anchors may be nested, so the next anchor needs not to be
| the end of the current fragment.  And: the fragment may end
| semantically before a next anchor occurs).

Certainly, but this is meant for humans, who should be able to figure
this out. So I don't think it will be a problem in practice, though
the recommendations should perhaps say something about this issue so
that PSI authors do the right thing in this respect.
 
| I am sure that the most relevant folks that I want to use topic maps
| (and PSI) never will use docs because they have millions of PS (The
| Alexandria Digital Library Gazetteer contains 4,9 million subjects
| to be published, see
| http://fat-albert.alexandria.ucsb.edu:8827/gazetteer/. Of course,
| this is a database application, but not a PSD currently).

So what? They don't have to use documents. Nothing says they should,
nothing requires them to. So what is the problem?
 
| But when some newcomer starts a new set, he might prefer documents,
| as he might not have any database server available, and his subjects
| are not that many in the beginning. That's completely OK. I do not
| want to descriminate anybody. But if this guy (or girl, sorry, Mary
| and others) prooves to be successful, the set will grow. There will
| be a day when she says: "O God, I cannot handle this huge document
| any longer!" As she has been succesful, probably there will be some
| help by database professionals to overcome this.

This is not an entirely unrealistic scenario, but I don't see what we
can do it. HTTP, HTML, and URIs are there, and while they are
certainly anything but perfect we have to use them as they are. Do you
see any way around this problem within that context? 

Certainly, we should document this issue and help publishers avoid
this pitfall, but I don't see what more we can do. Do you?

-- 
Lars Marius Garshol, Ontopian         <URL: http://www.ontopia.net >
ISO SC34/WG3, OASIS GeoLang TC        <URL: http://www.garshol.priv.no >
References:
- [tm-pubsubj-comment] human actors and machine agents
  - From: "Bandholtz, Thomas" <Thomas.Bandholtz@koeln.sema.slb.com>