tm-pubsubj message

Subject: Re: [tm-pubsubj] Comments from Dave Beckett
From: "Bernard Vatant" <bernard.vatant@mondeca.com>
To: "'tm-pubsubj'" <tm-pubsubj@lists.oasis-open.org>
Date: Tue, 1 Apr 2003 12:44:14 +0200
Mary

Sorry for the late answer. This is an important debate.

> We should *always* modify  *resources* with  *network retrievable* or as
> you mentioned for the other types. This is what Lars Marius said, and I
> agree with it.
>
> If we do this, then there is no conflict with the RFC def of *resource*
> since we do not use term by itself.
>
> I think we agree on this.

Yes, we do.

> >Does this cover exactly what we mean by "resource" in our document? Not
> >quite sure. It is
> >not consistent with 2.4 "most subjects are not resources".
>
> This is not an exact statement, looking at it again. It is also very
> confusing. I think we have been so used to talking like this that it is
> hard to see where others might not understand.
>
> A subject, as such, may not be a *network retreivable resource*
>
> we should avoid using the word *resource* standalone so to speak since it
> will cause confusion. that's all.

Agreed again. And maybe add an annex-glossary clarifying those correspondences between
various uses of subjects. What Lars Marius has done in its paper for XML Europe is a good
basis.

> >There are some authoritative
> >people in this TC who do not like that much the RFC2396 definition of
> >"resource" (even
> >indirectly referenced by Dublin Core).
>
> OK, I see your point on this. We do not have to be explicit on this one
> and say that our resource is the resource of the DC, but we will not use
> this word standalone as I mentioned before to avoid confusion.

That should make it.

> >Lars Marius has long ago supported that topic maps
> >"subject" and RDF "resource" are equivalent. And if I remember well, in a
> >RDF-TM meeting
> >in Seattle a year ago, this was written down on a board and accepted as a
> >consensus by
> >both TM and RDF folks in the room. So, if we want to be consistent, we
> >should write
> >"information resource" all along instead of "resource".
>
> agreed.

I would be happy to see Eric sign up that agreement too :)

> > > In addition, we need to look again at using the acronym PSI for both
> > > indicator and identifier.
> >
> >OK. This keeps coming to the surface again and again. I had always mixed
> >feelings about
> >it. My opinion now is that we should come back to the historical
> >extension of the acronym
> >as Published Subject Indicator. See below.
>
> Agreed. Using one acronym for two words will always cause serious
> confusion. I have never changed my position on this one. In my Montreal
> paper, I only used PSI for Published Subject Indicator and did not even
> mention "identifier" in it. I bet nobody missed it. The concept of the
> "identifer is there but I avoid the term to avoid the two word/one acronym
> phenomenom.

OK for pedagogy's sake ... but we decided in last meeting to have a Requirement 3:

"Each Published Subject Indicator must have exactly one Published Subject Identifier."

If we want to go over acronym ambiguity, which I agree is a good idea, maybe we need
another name for "Published Subject Identifier", but we can't get rid of the concept. It's
fundamental. Any suggestion for alternative name is welcome.

> > ... distinction between "identifier for computers" and "indicator for humans"
> >is central to
> >the recommendation. If we drop that distinction, it is "tabula rasa" and
> >we are back to
> >the starting point of autumn 2001. Is it what you want?
>
> What I mean is, we do not have to state explicitly the term *published
> subject identifier* or *subject identifier*. We reserve using *published
> subject* and *subject* for modifying *indicator* only, to avoid confusion
> and PSI is reserved for this one.
>
> The identifier is the URI of the published subject indicator.

Two remarks here:

1. "URI of the published subject indicator" is something to be clearly stated as unique
(we suggested "canonical" at some point, but it was dropped out because of other technical
ambiguities on that term).
This is Requirement 3 mentioned above.

2. What does the identifier meant to identify? Certainly not the indicator itself, but the
subject indicated by the indicator. So how do you express that, without any notion of
"subject identifier". However you name it, you can't escape the concept. It is exactly
what PSIs are about (at least in my mind) ... and it is that identifier that merging will
use, etc ...

> > > dajobe: 2.4.2 " The address of a subject indicator is called a subject
> > > identifier." but there is also something else called a 'subject address' !
> >
> >Some subjects are information resources, and are directly addressable ...
> >If that (2.3)
> >was not understood by Dave, who will understand it? This section was
> >added by Steve, but
> >I'm not sure it brings anything more than confusion. Maybe we should
> >strike any reference
> >to "addressable subjects" since they do not need subject indicators,
> >although they need
> >identifiers, but that is the global URI issue, which is not in our TC
> >scope, fortunately
> >... we have already worms enough in our can, don't need that one too :))
>
> Now I am confused too :)
>
> We need to be simple and pragmatic. I know what you mean and what this
> passage meant since we have been having this discussion for months now,
> but from your statement above, I don't think that Dave would understand
> what you are talking about.

Well. I agree this is the most difficult point, explaining clearly that we need two levels
of indirection.

The first level of indirection is *one-to-one* matching of a URI to a network-retrievable
resource.This is the can of worms we don't open in this TC. There are other battlefields
for that, and it seems the war is far from over in those places. What we want here is
simply, to find inside the subject indicator a statement of *the* unique URI that has to
be used as identifier (of the subject) - knowing that the subject indicator can be
retrieved through a variety of paths (including fancy redirections or query strings) ...

The second level of indirection is for humans only, understanding through the subject
indicator what the subject *is* - whatever that means.

The subject indicator is a finger pointing at the Moon, only (intelligent) humans will
look in the proper direction and agree that the subject is the Moon upthere, the computer
applications will only agree on the identity of the finger ... Do we have to express it
that way?

> >The identifier is what the TM
> >applications will use to merge topics! So how can you sweep it under the
> >carpet?

>   What I meant is, I find that I do not need the term published subject
> identifier (as I, mentioned, I did not used it inmy paper and nobody missed
> it or complained about it not being there). Of course I need the *thing* itself.

Personally, in that kind of domain, if I need the thing, I need badly to name it in order
to be able to speak about it. I don't understand how we could avoid speaking about it *at
all* in the specification ...

> > We have to be crystal clear on the fact that
> >the subject
> >identifier is not any damned URI that can retrieve the subject indicator
> >by any mean, but
> >*the* URI which is defined and explicitly written down *inside* the
> >subject indicator by
> >the publisher.

> This is important, but I think this explanation needs to be clearer. What
> I am saying is we should avoid using *subject* or *published subject*
> modifying *identifier*

OK, so what do you propose instead? I'm curious to know ...

Bernard