tm-pubsubj-comment message

Subject: [tm-pubsubj-comment] The structure and interpretation of publishedsubject documentation

From: Lars Marius Garshol <larsga@garshol.priv.no>
To: tm-pubsubj-comment@lists.oasis-open.org
Date: Fri, 26 Apr 2002 13:12:56 +0200


My current vision of PSDs is that they may contain the following
information:

 (1) human-readable definitions of each published subject,

 (2) a human-readable description of the PS set,

 (3) formal assertions in machine-readable form about the published
     subjects,

 (4) formal metadata in machine-readable form about the PSD.


The question is: how is this best packaged for distribution? My
current thinking is that a structure like the following might be
appropriate: 

 (a) An entry page for the PSD, which contains (2), and possibly also
    (1) and (4).  Recommended format: HTML or XHTML.

 (b) One or more pages with the definitions of each PS (that is (1)),
     but this can be dropped if (2) is included in (a). Recommended
     format: HTML or XHTML.

 (c) A file with the core assertions about the PSs. This would be part
     of (3), but not necessarily all of it. It *could* also include
     (4). The notion of "core assertions" needs refinement, but to me
     this could be the assertions that are considered part of the
     definition of the PS.

     Recommended format: XTM or RDF.

 (d) One or more files with additional assertions about the PSs. This
     would be the rest of (3), but could be skipped if (c) includes
     all of (3). Here would go statements about the PSs that are
     potentially controversial, including mappings to other PS sets.

     Recommended format: XTM or RDF.

 (e) A file with the formal PSD metadata (4), unless that is included
     in (a) or (c).

     Recommended format: XTM or RDF.

 (f) The entire PSD set packaged into a single file for easy download.

     Recommended format: zip or tar.gz. (We could also go the .jar
     route and make a simple manifest proposal that would make these
     files automatically usable by software. That might be over the
     top, but it might also generate a lot of interest.)

Several of us seem to be drifting towards the conclusion that the
actual subject identifiers (that is, the subject indicator URIs)
should point to (b) (or (a), if there is no (b)) so that following a
subject identifier would take you to something human-readable. 

This makes things harder for software, however, so we may want to
consider either changing our minds or adding mechanisms to make it
possible to cater to both carbon-based and silicon-based audiences.

I see a number of issues lurking here, which are not in Bernard's
issues list. These are (as best I can work it out):

  - what is the recommended structure for PSDs?
  - what should published subject identifier URIs point to?
  - is there a distinction between "core assertions" and
    "additional assertions"? what is it? (this is essentially the
    issue underlying Bernard's number 4, I think)
  - should (e) and (c) be a single file, or more than one?
  - what are the terms for (1)-(4) and (a)-(f)?
  - what are the publishing contexts we are trying to cater for?
    do we expect all PSDs to be published on the web? or do we want to
    cover other media as well? if so, which ones?
  - when should associations be used to express formal assertions, and
    when should occurrences be used? (taken from Bernard's post below)

The analysis above, with definition of new terms, should also let us
reformulate several of the existing issues more precisely.

There are also a heap of more practical issues that could go into the
issues document, but it would be good if we could achieve more overall
clarity before delving into all that detail.

* Bernard Vatant
|
| So back to the frontburner - Say we have a botanical taxonomy for
| trees
| 
| Dicotyledones > Fagales > Fagaceae > Fagus
| 
| The "definition" (description?) of subject "Fagaceae" includes, for
| a botanist, two fundamental kinds of information.
| 
| 1. The relationships of class Fagaceae with its upperclass Fagales
| and subclasses (Fagus, ...)
| 2. The specific characteristics distinguishing Fagaceae among other
| Fagales
| 
| A (PS Doc) topic map representation of this taxonomy could be
| structured as:
| 
| -- either a "flat" set of topics, each one providing all the above
| information in "description" occurrences using <resourceData>.  This
| solution has the advantage to include the whole definition inside a
| <topic> element, making it easy to declare the subject indicator as
| being this <topic> element

Well, this mixes up a number of issues, some of them relating to
packaging, others to preferred formal representation. 

With regards to the first I would say that both 1. and 2. are part of
the *definition* of the published subjects, and therefore both should
go into (c). They should also be replicated in (b).

As for the second I would say that anything that can be expressed in
structured form (as opposed to as human-readable text) should be. I
don't think we should use <topic> elements as subject indicators, and
even if we did the lexical representation of the formal assertions
about the published subjects should not concern us. A topic map engine
will handle associations just fine in this case, and human's will find
it too hard to wade through the XTM markup anyway, so I don't think we
need to try to make it easy for them. (We will fail anyway.)
 
| Of course, that would mean split the first deliverable into two
| parts at least:
| 
| 1. Recommendations for classifications, taxonomies, thesaurus and
| similar hierarchical sets of subjects.
| 2, 3, ...n ... everything else.

I don't think this is a good idea. I would expect 'n' to be infinite.
We should try to make our recommendations so general that they can
apply to any PS set.

-- 
Lars Marius Garshol, Ontopian         <URL: http://www.ontopia.net >
ISO SC34/WG3, OASIS GeoLang TC        <URL: http://www.garshol.priv.no >

References:
- [tm-pubsubj-comment] ISSUE 4 - Relationships between subjects
  - From: Steve Pepper <pepper@ontopia.net>
- Re: [tm-pubsubj-comment] ISSUE 4 - Relationships between subjects
  - From: Lars Marius Garshol <larsga@garshol.priv.no>
- ISSUE 4 bis - Re: [tm-pubsubj-comment] ISSUE 4 - Relationships betweensubjects
  - From: Bernard Vatant <bernard.vatant@mondeca.com>