ubl-lcsc message

Subject: Re: [ubl-lcsc] Code sets for Document and LineItem Status
From: Chin Chee-Kai <cheekai@softml.net>
To: Tim McGrath <tmcgrath@portcomm.com.au>
Date: Fri, 17 Oct 2003 09:09:05 +0800 (SGT)
I can understand whe presentational information should be
deferred till the actual application context and document
presentation time (lazy formatting approach).  And I agree
with that approach in general;  indeed, that's the essence
of XML-data + XSL-transformation  architecture.

However, the manner in which code values are listed within the
schemas give them the semantics.  Take for instance, UN/ECE
codes for Country Codes (eg "JP"), Currency Codes (e.g. "JPY"),
Location Codes (e.g. "JPNRT") are all stated in capital letters.
I see this as a nice thing they are doing, because by doing so,
the codes are canonicalized to one standard manner of a given
charaterset.  For e.g., while one might say "Jp" and "jp" or
even "jP" are equally country code for Japan, the only valid
code is "JP", according to UN/ECE.

This capitalized form also has a quiet advantage that when
the values are mapped to programming languages, they'll fall
neatly into usual practice of using capital-letters to
represent constants.  Not that it's hard to do with other
non-canonicalized spellings of code values, but the one-to-one
mapping provided by a capitalized form is attractive.

Also, it should be noted that the code values themselves need 
not necessarily be the presentation form shown to the user, 
data entry person or reader.  This is why UN/ECE has
corresponding description names for each of the code values.  
E.g.
    <Country>
      <CountryCoded>JP</CountryCoded>
      <CountryNamed>JAPAN</CountryNamed>
    </Country>

where the <CountryNamed> value could then be used for human
selection (e.g. in a dropdown list).

With UBL-defined values, we might not have discussed it, but
I've observed a quiet practice of Upper Camel Casing in spelling
the code values.  So far, that quiet rule hasn't been broken.
But given that we have no stated rule on spelling code values,
in time, if only one UBL-defined code list has another manner
of spelling the code values (such as lower Camel Case, or having
spaces as word separators within a given code value), then
we won't be very self-consistent in spelling out the code values
anymore.

Also, code values usually don't have spaces (not that we cannot
do that, but my guess is that in general, it isn't a wide
practice).  We've seen our own code list values defined as
"OrderResponseSimple", "OrderResponseComplex" by compacting
out the spaces.  If we then start using "No Status" as another
code value, we'd be running into (or at least I would ask)
questions like "When do we use space separation, and when not?".

with or without explicit consciousness in having code value rules,
a pattern emerges in how we spell the values.  I think it could
make the value listing more consistent across UBL-defined
code lists by following some prescriptive rules of spelling
the values.  And one way of having a prescriptive rule might
be just to follow the UN/ECE way of capitalizing the values,
with some extra transformation rules to canonicalize punctuations,
spaces and hyphens.



Best Regards,
Chin Chee-Kai
SoftML
Tel: +65-6820-2979
Fax: +65-6743-7875
Email: cheekai@SoftML.Net
http://SoftML.Net/


On Fri, 17 Oct 2003, Tim McGrath wrote:

>>My call is that we do not want to impose any restrictions or
>>presentational rules on the content of any documents.  This includes
>>character case .
>>
>>These are decisions for applications to make.  Bear in mind UBL is not
>>standardising on what goes in  documents only the semantics and
>>structure.  We actually crossed a boundary when we starting looking at
>>determining code set values (for good reason) but we dont want to go any
>>further than necessary.
>>
>>You can guarantee that any formatting rules we make would break
>>someone's requirements.
>>
>>PS the same applies to facets like "XXX-XXX-XXX" for account number, etc...
>>
>>
>>Chin Chee-Kai wrote:
>>
>>>Do we need a rule to make all code list values look uniformly
>>>upper-cased?
>>>
>>>It appears the various code list values, when compared
>>>across various code list schemas, appear to have Upper Camel
>>>Cases, with some having spaces in between multiple-worded values.
>>>
>>>We may need to require implementation rules, for instance,
>>>that when code list values are implemented in schemas,
>>>they should go through the following transformations:
>>>
>>>
>>>(Example shown is somewhat contrived to illustrate how the
>>>transformation rule works.)
>>>
>>>1. Remove all punctuations, except "-", and compacting the
>>>   resulting string.  Multiple spaces would be reduced to a
>>>   single white space, and remove prefixing and trailing
>>>   spaces (required by base type xsd:token)
>>>
>>>   E.g.   "New  York's  Philharmonic Orchestra    ---  Class A"
>>>   -->    New Yorks Philharmonic Orchestra --- Class A
>>>
>>>
>>>2. Replace multiple occurences of "-" with a single hyphen.
>>>
>>>   E.g.   New Yorks Philharmonic Orchestra --- Class A
>>>   -->    New Yorks Philharmonic Orchestra - Class A
>>>
>>>
>>>3. Replace any occurence of the sequence " -", "- " and
>>>   " - "  (space followed by hyphen, or hyphen followed by
>>>   space, or space followed by hyphen followed by space)
>>>   with just hyphen "-".
>>>
>>>   E.g.   New Yorks Philharmonic Orchestra - Class A
>>>   -->    New Yorks Philharmonic Orchestra-Class A
>>>
>>>
>>>4. Replace any occurence of space " " with "-"
>>>
>>>   E.g.   New Yorks Philharmonic Orchestra-Class A
>>>   -->    New-Yorks-Philharmonic-Orchestra-Class-A
>>>
>>>
>>>5. Replace all characters with their equivalent uppercase.
>>>
>>>   E.g.   New-Yorks-Philharmonic-Orchestra-Class-A
>>>   -->    NEW-YORKS-PHILHARMONIC-ORCHESTRA-CLASS-A
>>>
>>>
>>>
>>>
>>>Best Regards,
>>>Chin Chee-Kai
>>>SoftML
>>>Tel: +65-6820-2979
>>>Fax: +65-6743-7875
>>>Email: cheekai@SoftML.Net
>>>http://SoftML.Net/
>>>
>>>
>>>On Wed, 15 Oct 2003, Stephen Green wrote:
>>>
>>>
>>>
>>>>>Hi
>>>>>
>>>>>...
>>>>>
>>>>>I'd suggest just
>>>>>
>>>>>'No Status'
>>>>>'Revised'
>>>>>'Withdrawn'
>>>>>(but I'd rather 'Cancelled' which we don't seem to have here)
>>>>>and
>>>>>'Disputed'
>>>>>
>>>>>
>>>>>
>>>>>from UN/CL 4405
>>>>
>>>>
>>>
>>>
>>>
>>
>>--
>>regards
>>tim mcgrath
>>phone: +618 93352228
>>postal: po box 1289   fremantle    western australia 6160
>>
>>
>>
Follow-Ups:
- Re: [ubl-lcsc] Code sets for Document and LineItem Status
  - From: jon.bosak@sun.com
References:
- Re: [ubl-lcsc] Code sets for Document and LineItem Status
  - From: Tim McGrath <tmcgrath@portcomm.com.au>