[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]
Subject: re: Classifications - response to Martin Bryan
Registry folks, Martin responded to a message I sent to this list on 25 June 2001. Herein I reply to some of his comments. My comments get a bit long - sorry about that, but Martin's topics are important and worthy of further discussion. -- Len ------------------------------------------------------------------------ At 03:12 AM 6/26/01, Martin Bryan wrote: >Len > >I believe the changes you have requested are much needed, though I have >reservations about thinking of classification schemes as a single hierarchy. >I would ask you to support the following additional use cases: > >A) A user community wishes to use only a part of a classification, starting >at a particular node and restricting the number of levels, and/or members of >the sets at one level, that a user can go down below the start point. >Example: Only those nodes listed under the heading Europe that were >registered as member states of the EU in 1995. (This raises the question of >a) how do you classify/qualify a classification node to indicate its >membership of another category and b) how do you query the date at which a >clssification was valid?) This is potentially a very complex refinement of an existing classification scheme. To make Martin's example a little more specific, suppose the starting point is a 2-level classification scheme where the 1st level identifies a continent and the 2nd level identifies a country. Suppose continents are identified by English name, i.e. Asia, Africa, Europe, NorthAmerica, SouthAmerica, etc. and suppose countries are identified by the 2-character internet code and by the spelled out English name of the country. Depending on how continents are defined by the scheme, a country may be allowed under multiple continents, e.g. Turkey may yield two nodes in the scheme, Asia/Turkey and Europe/Turkey, or Australia may be both a continent and a country, Australia/Australia and Australia/NewZealand (apologies to anyone offended by this contrived example!). Thus the whole path would be necessary in order to identify a single node in this scheme. Now Martin asks how one might create a new classification scheme, EU-1995, that isolates the countries of Europe that were registered as member states of the European Union (EU) in 1995. In my mind this is a completely different situation, possibly addressed with a solution other than classification schemes. EU-1995 may use some of the country name values under the Europe node in the existing classification scheme, but there is no way in either the current RIM specification, or in my proposed modification, that would allow this particular subset without creating new nodes. This is because the existing scheme has no notion of the European Union - instead, it is a Continent/Country scheme. These new nodes might use the same 2-character identifiers and the same English spellings of names to identify the countries, but they are nodes of a different classification scheme, not some revision of the existing "Continent/Country" 2-level scheme. I think a better approach to the EU-1995 problem would be to assume that each country has a RegistryEntry. Then use the Slot mechanism to define an optional attribute, just for European countries, named, AdmissionToEUDate. This attribute would record the date of admission to the EU. One might also use slots to define a second attribute, ExitFromEUDate, for European countries that choose to leave the EU. Then the membership of the EU on a specific date could be determined by a query on RegistryEntry using these two slots and their values. Now return to Martin's first general question: a) how do you classify/qualify a classification node to indicate its membership of another category. The assumption here is that a classification node takes on a life of its own and can participate in many different categories. This could get very complex. Isn't this very much like a "Topic Map" where a bunch of topics get registered and then have associations with one another? A Registry could support such mechanisms without the need for classification schemes at all. Instead, a new objectType value for "topic" could be added to the enumeration for RegistryEntry.objectType and new associationType values could be added to the enumeration for Association.associationType to record the different types of associations allowed between topics. In summary, I'm not convinced that it's a good idea to try to classify/qualify a single node of a classification scheme. Nodes in a classification scheme are inter-dependent, e.g. if the scheme has a stability attribute of "Static" shouldn't all nodes in that scheme be static too? Will will have much difficulty writing the inter-dependence rules if we allow individual nodes of a classification scheme to have the independence of separately registered items. Now return to Martin's second general question: b) how do you query the date at which a classification was valid? Again, this is a very complex topic, and gets into the notions of validDate, changeDate, etc. for temporal databases. A Registry is intended to be a very simple database. I'm not sure we're ready yet to embrace the complexities associated with maintaining a historical record, like is done in temporal databases, of every classification ever held for a registry entry. But we can use existing features of the Registry to solve specific problems involving dates, like suggested above for entries into and out of the EU. Political alliances could be registered (e.g. EU, NATO, SEATO, etc.), and country membership in such alliances could be maintained by a "Membership" association from the registry entry for Country to a registry entry for the relevant PoliticalAlliance. NOTE: The AuditableEvent class in RIM will keep track of when a classification was modified, and by whom, but it won't necessarily remember the whole history of previous values. NOTE: For pointers to research papers on Temporal Databases, see "A Glossary of Temporal Database Concepts", ACM SIGMOD Record, 23, No. 1, March 1994, or the TSQL2 Language Specification, a working group chaired by Richard Snodgrass with a specification published in September 1994. >B) A user community may need to define a locally significant extension to an >existing code list. Example: ISO 3166-1 defines the United Kingdom of Great >Britain and Northern Ireland as a single code point (GB). ISO 3166-2 also >defines each of the countries of the UK as separate entries using the 3 >digit extensions of the base code, and each of the counties within each of >the countries at the level below that. (How these three digit codes would be >related to the two digit code in a separate list is another challenge to the >RegRep model!) However, for legal reasons, the classification of UK laws >requires that there be classifications based on England & Wales, Scotland, >Northern Ireland, The Channel Islands and Isle of Man and do not (at >present) apply separately to individual counties. Therefore someone wanting >to classify these either has to define a proprieatry scheme or needs to >define extensions to the existing scheme, either by redefining the level >below the UK entry in the 3-letter scheme completely, or by adding a special >category for the combination of England & Wales to the existing >classification scheme at the middle level of the larger classification >scheme. This must be done by someone without the rights to update the ISO >3166 classification. I agree with Martin that "local extensions" of a classification scheme are very important and must be supported in some manner. Another example of local extensions is for Genus/Species, where members of the research community are continuously adding new Species and splitting Species into Subspecies. If a classification scheme were treated as a whole object to be registered, rather than just having the nodes registered as in the current RIM, then a RegistryEntry.stability attribute value of "Dynamic" would allow arbitrary changes to the nodes and hierarchical structure of the classification scheme; classifications via a dynamic classification scheme could become obsolete over time. A stability attribute of "DynamicCompatible" would only allow additions to be made to the hierarchy, not changes to the existing hierarchical structure, thereby preserving the validity of existing classifications. And a stability of "Static" would mean the nodes and hierarchical structure are fixed until at least the expirationDate of the registry entry. So a stability value of "DynamicCompatible" would solve part of Martin's use case, i.e. the scheme could evolve in an upward compatible manner, where the term "upward compatible" would have to be defined in the specification. In my mind, "upward compatible" would allow the addition of new nodes to the scheme, but would prohibit deletion of existing nodes or re-structuring of the existing scheme hierarchy. If desired, a revised classification scheme could supercede an existing one, thereby maintaining a record of all past versions. But if a scheme is "Static", like most ISO standards are, then we'd have no alternative but to define separate classification schemes for the extensions beyond each existing node. We could use ISO 3166-1 as a 1-level scheme for Country and separate National standards for extensions to leaf nodes of that scheme. It would then be up to a user to be aware that two separate classification schemes may be required to classify a given repository item, e.g. Continent/Country may be one such scheme and Country/LocalPoliticalUnit may be another. Or LocalPoliticalUnit may be a collection of separate classification schemes, one for each country. NOTE: However, Martin's question about mapping 3-digit codes for countries to 2-character codes for countries remains unsolved by this approach. The related schemes would have to handle country identifiers in the same way. >C) A user community may need to use the union of two classifications. >Example: Using both ISO 3166 country codes and ISO 639 language codes to >indicate language variants such as EN-US. (The combination of the two parts >of ISO 3166 mentioned above is another example.) If classification schemes were registered as a single unit instead of the current RIM's dependence on sets of nodes, then we could address the problem of creating new classification schemes from references to parts of existing ones. But the topic is complex! It would be nice to have a stable and agreed definition of representations and metadata for "classification scheme" as if it were a single repository item; then it would be much easier to address these kinds of problems. For example, in the Continent/Country and Country/LocalPoliticalUnit schemes discussed above, I would favor the ability to create a new scheme Continent/Country/LocalPoliticalUnit that was defined via references to the existing schemes without the need for re-creating nodes, even if the existing schemes were repository items in separate Registries! Such capability would be an upward compatible extension to what I'm proposing. >Martin Bryan >Technical Manager, The Diffuse Project > >++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >Diffuse: http://www.diffuse.org, mailto:mtbryan@diffuse.org >The Diffuse Project is funded under the European Commission's IST programme. >Diffuse publications are maintained by TIEKE (Finnish IT Development >Centre), >IC Focus and The SGML Centre. >++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > >The SGML Centre, 29 Oldbury Orchard, Churchdown, Glos GL3 2PU, UK >Phone/Fax: +44 1452 714029 E-mail: mtbryan@sgml.u-net.com > >For further details about The SGML Centre visit http://www.sgml.u-net.com ************************************************************** Len Gallagher LGallagher@nist.gov NIST Work: 301-975-3251 Bldg 820 Room 562 Home: 301-424-1928 Gaithersburg, MD 20899-8970 USA Fax: 301-948-6213 **************************************************************
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]
Powered by eList eXpress LLC