OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

ubl message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]

Subject: Re: [ubl] Code list metadata

At 2012-11-17 15:20 +0800, Tim McGrath wrote:
we have to tread carefully here because the maintenance procedures for these inside CEFACT are complex and we need to ensure the correct identification to the canonical versions.

Absolutely! But I wouldn't have to "tread carefully" if someone gave me a clear walking path to follow, which I do not have. I'm walking blindly here without guidance trying to come up with the most appropriate approach.

It seems that UN/CEFACT is not following the same conventions for versioning for all code lists.

Today I will try to mechanically distill some kind of pattern out of the following sources of code lists, but I think a pattern to find "Revision" is impossible:


for example, i suspect the 66411 may be to reflect the UN/EDIFACT element number 6411 "Measurement unit code" with the extra 6 prepended to denote UNECE as the agency responsible. following this model the Packaging type code number would be 7065 "Package type description code" prefixed with 6, so 67065.

I do see "67065" used as part of a namespace prefix, but not as part of the URI ... and I see "2006" as the version at the end of the URI here:


... where this is used:


That URI, then, is very different than the URI found in:


... where this is used:


There is no pattern for "Revision" or for the URI! But I do see "Version" information at the end.

so yes, there is duplication and inconsistency but staying aligned with UN/EDIFACT is a good idea.

How does one stay "aligned" when there is no pattern to follow? Those two URIs are from Rec 20 and Rec 21, so I would have expected them to have the same structure.

This includes using the word 'Revision' to be really clear that Version is the same piece of metadata.

Note there is no revision information in the URI. The web site documentation for Recs 20 and 21 states which revision information and it is reflected somewhat in the name of the file if you know where to look. Such information is not found reliably.

Also, I note that "Revision X" also does not appear to be documented as the code list version. Note the following excerpt from the Rec 21 schema documentation:

  Code list name:     Package Type Code
  Code list agency:   UNECE
  Code list version:  2006

Even though there is no such comment in the Rec 20 schema documentation, the "Rev9e" of the Excel file name (!!) that would indicate "Revision 9" appears *not* to be used as the code list version.

So, based on the comment "Code list version:", what does appear to be a pattern is the final field of the URI appears always to reflect a version, but not the revision. Can we not simply use this last field as the version for the code list? I think that appears to be more reliable and consistent than trying to glean "Revision" information from details that are outside of the file contents.

For Rec 20 this would be (where "CVUri" is Canonical Version Uri):

Ver: 2001
CUri:    urn:un:unece:uncefact:codelist:specification:66411
CVUri:   urn:un:unece:uncefact:codelist:specification:66411:2001

For Rec 21:

Ver: 2006
CUri:    urn:un:unece:uncefact:codelist:standard:UNECE:PackageTypeCode
CVUri:   urn:un:unece:uncefact:codelist:standard:UNECE:PackageTypeCode:2006

Then, for example, for http://www.unece.org/fileadmin/DAM/uncefact/codelist/standard/UNECE_CargoTypeCode_1996Rev2Final.xsd I could use:

Ver:   1996Rev2Final
CUri   urn:un:unece:uncefact:codelist:standard:UNECE:CargoTypeCode
CVUri: urn:un:unece:uncefact:codelist:standard:UNECE:CargoTypeCode:1996Rev2Final

That appears to be a consistent pattern that I could follow, and I note in the documentation that that last field *does* appear to be the code list version and not some "Revision" value:

  Schema agency:      UN/CEFACT
  Schema version:     1.0
  Schema date:        18 July 2012

  Code list name:     Cargo Type Code
  Code list agency:   UNECE
  Code list version:  1996 Rev 2 Final

... note that I would not try to preserve the spaces ... I would just use the last field.

Then anyone seeing a UN/CEFACT URI would know from the URI and not from some filename or web site documentation exactly which version it is. And I think I am justified because of the word "version", not "revision", found in the comments in those files.

Would this be acceptable?

. . . . . . . . Ken

p.s. I've published the tool that converts CSV files to genericode equivalents here:


On 17/11/12 7:02 AM, G. Ken Holman wrote:
Fellow UBL TC members,

Today I'm struggling with list-level metadata for our code lists for UBL 2.1.

In UBL 2.0, we oriented our list-level metadata around UN/CEFACT for those code lists that matched the enumerations baked into the schemas. Consider, for example, the Units of Measure list, UN/ECE Recommendation 20:

      <LongName xml:lang="en">Unit Of Measure</LongName>
      <LongName Identifier="listID">UN/ECE rec 20</LongName>
      <Version>Revision 4</Version>
<LongName xml:lang="en">United Nations Economic Commission for Europe</LongName>

I cannot see where the version information came from in the schema.
And I note two concepts of version: "Revision 4" (in <Version>) and "2001-update" (in <CanonicalVersionUri>).

For UN/ECE Recommendation 21, the Packaging Type list, we used UBL metadata:

      <LongName xml:lang="en">Packaging Type</LongName>
      <LongName Identifier="listID">UN/ECE rec 21</LongName>
      <Version>Revision 5</Version>
<LongName xml:lang="en">United Nations Economic Commission for Europe</LongName>

In both cases I think the version information should not include the word "Revision", so I'm suggesting changing that.

What do we do about identification? Both code lists come from UN/ECE recommendations. One would think their identification would be very similar. I cannot correlate on the UN/ECE web site the UN/CEFACT schema reference to "66411" for the Units of Measure. Is there a similar reference for the Packaging Type?

Should we use the UN/ECE format for both? If so, for others that did not have that format in UBL 2.0?

Or should we use the UBL format for all code lists now that we don't have any UN/CEFACT XSD files with enumerations? Then we would change the identification approach for the other code list files that used to come from XSD enumerations.

There are no genericode files yet published on the UN/ECE web site, so we have to create our own.

Thank you for any discussion and guidance on this subject. I've got the mechanics all working, but I need help to know what should go into these files.

. . . . . . . . Ken

p.s. today I put together a utility to convert CSV files into genericode files, so anyone finding code list information should be able to easily create CSV without needing to think about the XML

Contact us for world-wide XML consulting and instructor-led training
Free 5-hour lecture: http://www.CraneSoftwrights.com/links/udemy.htm
Crane Softwrights Ltd.            http://www.CraneSoftwrights.com/o/
G. Ken Holman                   mailto:gkholman@CraneSoftwrights.com
Google+ profile: https://plus.google.com/116832879756988317389/about
Legal business disclaimers:    http://www.CraneSoftwrights.com/legal

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]