ubl message

Subject: FW: Language codes as an UBL code list

From: "Paul Spencer" <paul.spencer@boynings.co.uk>
To: <ubl@lists.oasis-open.org>
Date: Mon, 19 Jul 2004 11:39:28 +0100

A colleague working for the UK Government has created a list of the ISO 639-2 3-character language codes using the UBL codelist format.

I attach the result of his work, the questions he raised and my comments in case they are of interest. There is one place where the specification asks for a use case and this work can supply it.

Regards

Paul Spencer
Director
Boynings Consulting Ltd
http://www.boynings.co.uk

-----Original Message-----
From: Paul Spencer [mailto:paul.spencer@boynings.co.uk]
Sent: 19 July 2004 11:20
To: Colin Mackenzie
Subject: RE: Language codes as an UBL code list

Hi Colin,

1) Yes, it is allowed to have more than one CodeName per Code. It is shown in section 3.1 of the spec as "0..n".

2) I had wondered what you would do about Welsh! I agree that it causes problems if one language is in several times. 2.5.2 shows this as being for future implementation. Unfortunately, quite a few things were pushed out of this release to meet the timescales for UBL 1.0. My view for the moment is that we should just include "wel". After all, German is "ger", not "deu". You could leave "cym" in the document, but commented out.

3) Again, mapping between codelists is a future (section 2.5.11). Perhaps this is the use case they are looking for.

4) I guess for the moment, this is a UK GovTalk document, and so we should use that. It is then up to eGU to get someone else to take it on. They can then assign copyright.

5) Since we are following an external specification, this is not a problem.

6) No comment at the moment.

7) I seem to remember that the multi-lingual requirement was dropped. In general, the UN works in English. If others want to create code lists in other languages, they are welcome to do so. However, I would expect to at least be able to add an xml:lang attribute somewhere. Of course, what is really needed, is an internal reference to the correct 3 character language code ...

If you are happy, I will forward your work and my comments to the UBL TC list to see if they have any comments.

Regards

Paul

-----Original Message-----
From: Colin Mackenzie [mailto:colin@elecmc.com]
Sent: 16 July 2004 16:24
To: Michael.Andrews@cabinet-office.gsi.gov.uk; Adam.Bailin@cabinet-office.gsi.gov.uk; paul.spencer@boynings.co.uk
Subject: Language codes as an UBL code list

Hi,

At yesterdays meeting I volunteered to knock-up a UBL format codelist schema for the ISO 639.2 three letter language codes.

If I had realised how much copying and pasting was involved I never would have bothered.

Anyway, please find attached a first cut attempt at producing the list (schema, test schema that imports it, sample XML file).

The list has been taken from http://www.loc.gov/standards/iso639-2/langcodes.html which is pointed to from the ISO site.

Paul, would it be possible for you to cast your eyes over it and also to consider the points below?

Some issues

1/ some language codes have two descriptions e.g. "chu" is described as
"Church Slavic; Old Slavonic; Church Slavonic; Old Bulgarian; Old Church Slavonic"

I have represented this as the following
   <xsd:enumeration value="chu">
    <xsd:annotation>
     <xsd:documentation>
      <CodeName>Church Slavic</CodeName>
      <CodeName>Old Slavonic</CodeName>
      <CodeName>Church Slavonic</CodeName>
      <CodeName>Old Bulgarian</CodeName>
      <CodeName>Old Church Slavonic</CodeName>
     </xsd:documentation>
    </xsd:annotation>
   </xsd:enumeration>

Is it allowed to have more than one CodeName per code? Is this the recommended way? Who knows, that's the trouble with sticking elements inside xsd:documentation and not using a schema for them.

2/ some CodeName s have two language codes e.g. Welsh is "cym" AND "wel".

using the UBL schema, I have created two separate enumerations, this does not seem ideal.

I am thinking if I was a programmer creating a drop down list for languages I would only want one "Welsh" on the list.

3/ Ideally there would be a mapping from the three letter codes to two letter codes, perhaps this code be added by someone putting another element in the xsd:documentation element

4/ I do not know who will end up owning the document and so I have kept in the OASIS comments, copyright, no eGIF meta data and used their style of namespace URNs etc. This means that it does not follow the current guidelines.

5/ The types defined in the schema (which have been adapted from UBL country codes code list) do not follow eGIF guidelines e.g.

a) complexType with name ending in "Type" not "Structure" (I never liked that rule anyway although I do follow it)

b) the use attributes which are optional and fixed (although the guidelines do mention the UBL as a special case)

6/ As I do not know which agency will take control of this, The codeListAgencyID and codeListAgencyName may be wrong.

7/ I do not know the process of localising the list, e.g. if you want the language names in French.

I also did not know that Klingon is an ISO recognised language,

Thanks

Colin

Colin Mackenzie
XML Consultant/Director
Electronic Media Consultants Ltd
17 North Wall, Cricklade, Wiltshire, SN6 6DU
Tel/Fax: +44 (0)1793 752193
Mobile: +44 (0)7974 422091
E-Mail: colin@elecmc.com
Web: http://www.elecmc.com

Language.zip