OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

regrep message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]


Subject: re: Classifications - response to Martin Bryan



Registry folks,

Martin responded to a message I sent to this list on 25 June 2001. Herein I 
reply to some of his comments. My comments get a bit long - sorry about 
that, but Martin's topics are important and worthy of further discussion.

-- Len

------------------------------------------------------------------------

At 03:12 AM 6/26/01, Martin Bryan wrote:
>Len
>
>I believe the changes you have requested are much needed, though I have
>reservations about thinking of classification schemes as a single hierarchy.
>I would ask you to support the following additional use cases:
>
>A) A user community wishes to use only a part of a classification, starting
>at a particular node and restricting the number of levels, and/or members of
>the sets at one level, that a user can go down below the start point.
>Example: Only those nodes listed under the heading Europe that were
>registered as member states of the EU in 1995. (This raises the question of
>a) how do you classify/qualify a classification node to indicate its
>membership of another category and b) how do you query the date at which a
>clssification was valid?)


This is potentially a very complex refinement of an existing classification 
scheme. To make Martin's example a little more specific, suppose the 
starting point is a 2-level classification scheme where the 1st level 
identifies a continent and the 2nd level identifies a country. Suppose 
continents are identified by English name, i.e. Asia, Africa, Europe, 
NorthAmerica, SouthAmerica, etc. and suppose countries are identified by 
the 2-character internet code and by the spelled out English name of the 
country. Depending on how continents are defined by the scheme, a country 
may be allowed under multiple continents, e.g. Turkey may yield two nodes 
in the scheme, Asia/Turkey and Europe/Turkey, or Australia may be both a 
continent and a country, Australia/Australia and Australia/NewZealand 
(apologies to anyone offended by this contrived example!). Thus the whole 
path would be necessary in order to identify a single node in this scheme.

Now Martin asks how one might create a new classification scheme, EU-1995, 
that isolates the countries of Europe that were registered as member states 
of the European Union (EU) in 1995. In my mind this is a completely 
different situation, possibly addressed with a solution other than 
classification schemes. EU-1995 may use some of the country name values 
under the Europe node in the existing classification scheme, but there is 
no way in either the current RIM specification, or in my proposed 
modification, that would allow this particular subset without creating new 
nodes. This is because the existing scheme has no notion of the European 
Union - instead, it is a Continent/Country scheme. These new nodes might 
use the same 2-character identifiers and the same English spellings of 
names to identify the countries, but they are nodes of a different 
classification scheme, not some revision of the existing 
"Continent/Country" 2-level scheme.

I think a better approach to the EU-1995 problem would be to assume that 
each country has a RegistryEntry. Then use the Slot mechanism to define an 
optional attribute, just for European countries, named, AdmissionToEUDate. 
This attribute would record the date of admission to the EU. One might also 
use slots to define a second attribute, ExitFromEUDate, for European 
countries that choose to leave the EU. Then the membership of the EU on a 
specific date could be determined by a query on RegistryEntry using these 
two slots and their values.

Now return to Martin's first general question: a) how do you 
classify/qualify a classification node to indicate its membership of 
another category.

The assumption here is that a classification node takes on a life of its 
own and can participate in many different categories. This could get very 
complex. Isn't this very much like a "Topic Map" where a bunch of topics 
get registered and then have associations with one another? A Registry 
could support such mechanisms without the need for classification schemes 
at all. Instead, a new objectType value for "topic" could be added to the 
enumeration for RegistryEntry.objectType and new associationType values 
could be added to the enumeration for Association.associationType to record 
the different types of associations allowed between topics.

In summary, I'm not convinced that it's a good idea to try to 
classify/qualify a single node of a classification scheme. Nodes in a 
classification scheme are inter-dependent, e.g. if the scheme has a 
stability attribute of "Static" shouldn't all nodes in that scheme be 
static too? Will will have much difficulty writing the inter-dependence 
rules if we allow individual nodes of a classification scheme to have the 
independence of separately registered items.

Now return to Martin's second general question: b) how do you query the 
date at which a classification was valid?

Again, this is a very complex topic, and gets into the notions of 
validDate, changeDate, etc. for temporal databases. A Registry is intended 
to be a very simple database. I'm not sure we're ready yet to embrace the 
complexities associated with maintaining a historical record, like is done 
in temporal databases, of every classification ever held for a registry 
entry. But we can use existing features of the Registry to solve specific 
problems involving dates, like suggested above for entries into and out of 
the EU. Political alliances could be registered (e.g. EU, NATO, SEATO, 
etc.), and country membership in such alliances could be maintained by a 
"Membership" association from the registry entry for Country to a registry 
entry for the relevant PoliticalAlliance.

NOTE: The AuditableEvent class in RIM will keep track of when a 
classification was modified, and by whom, but it won't necessarily remember 
the whole history of previous values.

NOTE: For pointers to research papers on Temporal Databases, see "A 
Glossary of Temporal Database Concepts", ACM SIGMOD Record, 23, No. 1, 
March 1994, or the TSQL2 Language Specification, a working group chaired by 
Richard Snodgrass with a specification published in September 1994.


>B) A user community may need to define a locally significant extension to an
>existing code list. Example: ISO 3166-1 defines the United Kingdom of Great
>Britain and Northern Ireland as a single code point (GB). ISO 3166-2 also
>defines each of the countries of the UK as separate entries using the 3
>digit extensions of  the base code, and each of the counties within each of
>the countries at the level below that. (How these three digit codes would be
>related to the two digit code in a separate list is another challenge to the
>RegRep model!) However, for legal reasons, the classification of UK laws
>requires that there be classifications based on England & Wales, Scotland,
>Northern Ireland, The Channel Islands and Isle of Man and do not (at
>present) apply separately to individual counties. Therefore someone wanting
>to classify these either has to define a proprieatry scheme or needs to
>define extensions to the existing scheme, either by redefining the level
>below the UK entry in the 3-letter scheme completely, or by adding a special
>category for the combination of England & Wales to the existing
>classification scheme at the middle level of the larger classification
>scheme. This must be done by someone without the rights to update the ISO
>3166 classification.


I agree with Martin that "local extensions" of a classification scheme are 
very important and must be supported in some manner. Another example of 
local extensions is for Genus/Species, where members of the research 
community are continuously adding new Species and splitting Species into 
Subspecies.

If a classification scheme were treated as a whole object to be registered, 
rather than just having the nodes registered as in the current RIM, then a 
RegistryEntry.stability attribute value of "Dynamic" would allow arbitrary 
changes to the nodes and hierarchical structure of the classification 
scheme; classifications via a dynamic classification scheme could become 
obsolete over time. A stability attribute of "DynamicCompatible" would only 
allow additions to be made to the hierarchy, not changes to the existing 
hierarchical structure, thereby preserving the validity of existing 
classifications. And a stability of "Static" would mean the nodes and 
hierarchical structure are fixed until at least the expirationDate of the 
registry entry.  So a stability value of "DynamicCompatible" would solve 
part of Martin's use case, i.e. the scheme could evolve in an upward 
compatible manner, where the term "upward compatible" would have to be 
defined in the specification. In my mind, "upward compatible" would allow 
the addition of new nodes to the scheme, but would prohibit deletion of 
existing nodes or re-structuring of the existing scheme hierarchy. If 
desired, a revised classification scheme could supercede an existing one, 
thereby maintaining a record of all past versions.

But if a scheme is "Static", like most ISO standards are, then we'd have no 
alternative but to define separate classification schemes for the 
extensions beyond each existing node. We could use ISO 3166-1 as a 1-level 
scheme for Country and separate National standards for extensions to leaf 
nodes of that scheme. It would then be up to a user to be aware that two 
separate classification schemes may be required to classify a given 
repository item, e.g. Continent/Country may be one such scheme and 
Country/LocalPoliticalUnit may be another. Or LocalPoliticalUnit may be a 
collection of separate classification schemes, one for each country.

NOTE: However, Martin's question about mapping 3-digit codes for countries 
to 2-character codes for countries remains unsolved by this approach. The 
related schemes would have to handle country identifiers in the same way.



>C) A user community may need to use the union of two classifications.
>Example: Using both ISO 3166 country codes and ISO 639 language codes to
>indicate language variants such as EN-US. (The combination of the two parts
>of ISO 3166 mentioned above is another example.)


If classification schemes were registered as a single unit instead of the 
current RIM's dependence on sets of nodes, then we could address the 
problem of creating new classification schemes from references to parts of 
existing ones. But the topic is complex! It would be nice to have a stable 
and agreed definition of representations and metadata for "classification 
scheme" as if it were a single repository item; then it would be much 
easier to address these kinds of problems.

For example, in the Continent/Country and Country/LocalPoliticalUnit 
schemes discussed above, I would favor the ability to create a new scheme 
Continent/Country/LocalPoliticalUnit that was defined via references to the 
existing schemes without the need for re-creating nodes, even if the 
existing schemes were repository items in separate Registries! Such 
capability would be an upward compatible extension to what I'm proposing.



>Martin Bryan
>Technical Manager, The Diffuse Project
>
>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>Diffuse: http://www.diffuse.org, mailto:mtbryan@diffuse.org
>The Diffuse Project is funded under the European Commission's IST programme.
>Diffuse publications are maintained by TIEKE (Finnish IT Development
>Centre),
>IC Focus and The SGML Centre.
>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>The SGML Centre, 29 Oldbury Orchard, Churchdown, Glos GL3 2PU, UK
>Phone/Fax: +44 1452 714029  E-mail: mtbryan@sgml.u-net.com
>
>For further details about The SGML Centre visit http://www.sgml.u-net.com

**************************************************************
Len Gallagher                             LGallagher@nist.gov
NIST                                      Work: 301-975-3251
Bldg 820  Room 562                        Home: 301-424-1928
Gaithersburg, MD 20899-8970 USA           Fax: 301-948-6213
**************************************************************



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]


Powered by eList eXpress LLC