tm-pubsubj message

Subject: RE: [tm-pubsubj] Subject identification and ontological commitment : a real-world example
From: "Bernard Vatant" <bernard.vatant@mondeca.com>
To: "tm-pubsubj" <tm-pubsubj@lists.oasis-open.org>
Date: Tue, 28 Oct 2003 19:24:37 +0100

Lars Marius

> * Bernard Vatant
> |
> | This question was at the core of my former proposal to use OWL for
> | PSIs.  But when I made that proposal, certainly I pushed too quickly
> | the answer before setting clearly the question - certainly at the
> | time it was not completely clarified in my mind. Moreover the
> | proposal had too much political context to be popular.
>
> Frankly, I still haven't understood what the point of that proposal
> is. Maybe I'd understand if I studied it more carefully.

OK. Forget about that proposal for the moment.

> | So let's forget about any language, technical or process solution
> | for the moment, and focus on the following questions.
>
> That's probably a good idea anyway.

At least we agree on that :)

> | Q1: Is subject identification independent from ontological
> | commitment?
>
> I think it's clear that it is not, but that it's not at all clear,
> even after reading your email (sorry!), how this applies to published
> subjects.

Well, that's quite obvious to me. A published subject provides ways to
identify a subject, by means of subject identifier and subject indicator. A
PSI is a tool, not unlike a two-edges sword. This tool, like any tool is
forged for a specific task. A two-edges sword is forged for cutting people
in two unconnected parts, a published subject is forged for subject
identification : this is the head and that is the beheaded man. Making
published subjects without caring of how they will be used for subject
identification is like forging swords being agnostic about their efficiency
to chop people's head off.

> | Q2: If the answer to Q1 is "no", how can we articulate the two
> | concepts in our recommendations?
>
> Dunno. How do *you* think this affects the contents and use of PSIs?

By all means I tried to explain in my previous message. Apparently I missed
the target as far as you are concerned :(

> | We have addressed in Del 1 the question of subject identifiers and
> | subject indicators, but we have not really addressed the question of
> | *subject identification*.
>
> Well, is that really for us to address at all? We are working on
> helping people assign URIs that identify subjects, and to create
> resources that document these assignments.
> How these URIs are used for identification we've left for those who
> define other models to decide. So in topic maps there is one way to do
> this, in RDF another, in XML no standardized method, but several
> ad-hoc ones, and so on.

But that's EXACTLY where it hurts! If different applications and languages
use the subject identifiers at will with their own identification process,
there will be no way in which they can be interoperable. Unless, like I've
tried to show by the example at http://isbn.nu, the different actors in a
processing context commit to the same semantic properties of the subjects
they identify, whatever their internal language.

Just catch me well. I don't care that much about PSIs that would be usable
in a single processing context (like only in TM). I want to be able to say
that the subject in data base X is the same that the one in topic map Y,
the same that in ontology Z, the same as in thesaurus T, and be able to
syndicate information from those four sources without semantic clashes.
This requires, to be useful, that X,Y,Z and T, whatever their internal
structures and languages, don't attach to this subject  properties
conflicting with the generic ones included in the subject definition
itself. And the best and certainly only way to ensure that is that those
generic properties are declared somewhere as reference. And where and how,
if not by the subject indicator?

In the case of books, if the PSI http://isbn.nu/0534949657 declares
explicitly:
Hey, this identifies a single book. But wait, a book is something with a
title, an author, a publisher etc. Here are the identifiers for "title",
"author" "book" "publisher" ... and so on.
Our recommendation could say: if you use a PSI declared in such a way, it
implies you commit to those statements, use the same identifiers for
"author" and "title" and the like, not only the same ISBN number for this
specific instance. This is the only way I can find the "author" field in so
many sources.
What is not attached to the PSI is e.g. its price today at Amazon.

> In my opinion the very furthest we could possibly go in this area is
> to provide short, simple guidelines on how to apply PSIs in these
> different areas. I'm not sure we should, but maybe that would be good.
> It would probably help RDF users, for example.

I don't know if we have to care about any specific language, as long as it
is able to express things like the following:

This PSI for subject A declares explicitly that A is an instance of the
class B, defined by that other PSI for B. If you use this PSI, you
implicitly agree with that statement, which is a generic part of the
definition of the subject A, and therefore should not express using this
PSI something inconsistent with this definition like "A is not an instance
of B".
For example if GeoLang publishes PSIs for "Kurdistan" or "Palestine"
declaring they belong to the class "Country", someone using this PSIs to
assert that Kurdistan or Palestine are *not* countries is non-conformant to
the recommended use of GeoLang PSIs. At least it's my viewpoint. Of course
one could say that users can use identifiers the way they like. People can
also use words the way they like and are free not to understand each other.
But that is not something we should recommend IMO :))

Now, how any application and language will manage to express and
communicate that semantic commitment is another story, and maybe not our
business, as long as they can do it : RDF can, XTM can, OWL can, and the
list is open. What we can stick to is recommending something like:

- Definition of a subject in a subject indicator should contain assertion
of generic properties of the subject, like attributes and relationships
with other subjects, themselves (the relations and the subjects) identified
by other PSIs. Those generic properties should be reduced to the minimal
set needed to define the subject without ambiguity. It is recommended that
the subject indicator provides accurate and explicit indications concerning
those properties. Use of a PSI entails from its user a commitment to use it
in conformance with those indications.
- A recommended practice is for publishers to deliver sets of PSIs where
those properties are declared using a formal and consistent ontological
schema : classification, thesaurus, formal ontology, in any relevant
standard format.

> | Hope that helps to understand what I am about now.
>
> Not really. You've gone through lots of stuff, all of which was clear
> and fine, but what issues it raises for the PubSubj TC I don't really
> understand. Was it only Q1 and Q2? Or was it Q1, Q2, and the issue of
> guidelines for those using PSIs? Or all of those, plus yet more?

Certainly the last one. You know well I need someone like you to help me
stick to a finite set of questions. Nevertheless, my answer here hopefully
contains concrete proposals for future deliverables.

Bernard
Follow-Ups:
- Re: [tm-pubsubj] Subject identification and ontological commitment : a real-world example
  - From: Lars Marius Garshol <larsga@garshol.priv.no>
- RE: [tm-pubsubj] Subject identification and ontological commitment :areal-world example
  - From: Mary Nishikawa <nisikawa@fuchinobe.oilfield.slb.com>
References:
- Re: [tm-pubsubj] Subject identification and ontological commitment : a real-world example
  - From: Lars Marius Garshol <larsga@garshol.priv.no>