ubl-ndrsc message

Subject: RE: [ubl-ndrsc] [Fwd: Fwd: ISO 3166-1 -- Change of Alpha-3 CodeElement for Romania]
From: "Gregory, Arofan" <arofan.gregory@commerceone.com>
To: 'Phil Griffin' <phil.griffin@ASN-1.com>,"Gregory, Arofan" <arofan.gregory@commerceone.com>
Date: Mon, 11 Feb 2002 11:58:30 -0800
Phil:

Something to be aware of generally in this discussion is the way in which
all of our "tag names" are essentially in the same bag as codes: we are
choosing semantic, English-language names for semantic constructs. the
primary need is that they be machine-procesable. Human-readability is a very
useful thing, but not a necessary thing in a business document.

Each tag name points to a semantic construct (BIE and/or dictionary entry)
that has an associated UID. My impression - given that we are based on Core
Components work - is that we would use this UID as the unifying construct
for doing cross-language versions of our library. This is the approach that
Core Components is taking, and I think it is a good one.

In this case, building a separate mechanism for handling language
correspondence would be less helpful, not more helpful. Enumerated values
(codes) would become just another type of "tag name" in the UBL library.

As for what the "codes" look like, referring back to your "840" vs. "USA"
example, it is a big convenience if your primary audience can understand the
string and directly display it to the user interface. I could do this with
"USA" (although it would have to be case-sensisitve if we are relying on XML
tools in most cases) but I probably could not do it as successfully with 840
(I would argue that most business people would understand "USA" better than
"840", your wif's situation being a minority case).

Nothing stops "USA" being presented as "840", as you know. The requirement
to always provide a presentation value, however, places requirements on
application design, and this can have an impact by complicating the
application. My company faced exactly this situation, where one application
group built a product that used the "codes" in xCBL directly, while the
other always did a mapping from a coded value to a presentation form. They
built very different systems based on their assumptions about how codes
could be handled. The team that used the codes directly built a simpler
application, becaue it didn't need to have a lot of code in it for
presentation: it left this up to the customers as a customization using
XSLT.

I am certainly not disagreeing with your basic point, but I am trying to say
that if we already have a system for deriving language-versions of our
library, and we can provide a human-readable form of our "codes", to buy
some degree of simplicity in application design, then that is the right way
to go.

I vote for "USA", and let non-English speakers use tools to get the
presentation they need. UBL has chosen Oxford English for a reason, and I
believe we should follow that thinking in anticipating our audience.

Cheers,

arofan



-----Original Message-----
From: Phil Griffin [mailto:phil.griffin@ASN-1.com]
Sent: Monday, February 11, 2002 11:39 AM
To: Gregory, Arofan
Cc: 'Burcham, Bill'; 'Eduardo Gutentag'; NDR SC
Subject: Re: [ubl-ndrsc] [Fwd: Fwd: ISO 3166-1 -- Change of Alpha-3
CodeElement for Romania]




"Gregory, Arofan" wrote:
> 
> Folks:
> 
> I must point out that one of the things that has always struck me about
EDI
> was that, because non-intuitive alphanumeric "codes" were used to such a
> great extent, the language became arcane and subject to abuse. This leads
to
> expensive implementations.

But I am not calling for alphanumeric codes for
human consumption. What I am calling for is use
of unambiguous, national language independent
numeric codes for use by validation engines.

For human consumption, I would promote the display
and use of meaningful, readable text. But that said,
how can I claim that "USA" is the most meaningful
representation for a given reader? Isn't it true
that a French reader might prefer "Etats-Unis" over
"United States".

Numeric characters can be placed in a code list 
just as easily as the alphabetic characters being
proposed now. That is, I can use the characters 
"840" in an enumerated type just as easily as I
can use "USA". 
 
> This is *not* a point about the separation of presentation and content,
> which can be achieved in any number of ways (at least two of which underly
> the current discussion, and I'm not sure they're compatible).
> 
> I think we want to do a few simple things that EDI has failed to do with
> codes:
> 
> (1) Present our semantics clearly: intuitive tag names with absolutely
clear
> definitions

Absolutely. Three character code list values
are not suggested for use as tag names (I hope.)

> (2) Make sure that our "presentation" of code values has a simple default
> that doesn't rely on the implementer's interpretation of it

Then "ROM" or "ROU"? Which is more precise and clear?
Neither. It is the numeric identifier that is stable
and best used for validation. But neither "642", "ROM"
nor "ROU" is very good for presentation.

> (3) Make things human-readable where there is no good reason not to

Sure. But the case we discuss of "ROM" or "ROU" provides
a good reason, instability, not to use them. And I can 
hardly believe I here about the "readability" of "ROU". 
Give me a break. Outside of the present context of this
discussion, which of us, presented with these three 
characters would grok "Romania"? "USA", well sure, we all
know what that is. But "ROU"? Isn't that Kanga's kid?
 
> If we return to using alphanumeric codes to (supposedly) convey complex
> semantics, we have merely added a layer of obscurity that doesn't really
buy
> us very much.

I don't think that we ARE conveying anything complex here.
These are not country names, they are country codes. Country
code "B" has no significant semantic value over country code
"2". And a decent user interface would let a user type in
"ROU", map this value to "642" before actual validation.
 
> The point of separating content and presentation reads, to my way of
> thinking: "don't let presentation concerns cloud your semantic
definitions".
> I think that this is the heritage of XML, as I understand it. Having
> non-readable alphanumeric codes is a violation of this rule, to my way of
> thinking, and we should not go there unles it buys us something important.

In my opinion, "ROU" is not any better presentation than
displaying a number. Neither should be displayed. The
display should be "Romania" for English speakers and 
"Roumanie" if you prefer French. 
 
> (All right, Phil: let's talk about compactness of expression now! However,
I
> will point you to the design principles around tag naming that Eve had us
> prioritize at the F2F...)

Not an issue. Either way, "840" or "USA" is the same
size. Either could be used by a validation tool, but
the first one is a stable value and the second subject
to change. Neither should ever be displayed to provide
meaning to a human, but should be used locally to 
identify a piece of text suitable to the language of
the reader. 

For data entry? Who knows which is best. My wife's
an accountant and would probably prefer entering 
840 on a numeric key pad. I'd probably opt for
usa and expect it to work regardless of case.

Phil


> Cheers,
> 
> Arofan
> 
> -----Original Message-----
> From: Burcham, Bill [mailto:Bill_Burcham@stercomm.com]
> Sent: Monday, February 11, 2002 9:58 AM
> To: 'Eduardo Gutentag'; Phil Griffin
> Cc: NDR SC
> Subject: RE: [ubl-ndrsc] [Fwd: Fwd: ISO 3166-1 -- Change of Alpha-3 Code
> Element for Romania]
> 
> Point taken Eduardo: there does seem to be wide consensus that separation
of
> model from presentation is a Good Thing.  That's why it's ironic that in
the
> XML community we're always talking about how our document instances should
> be "readable".  "Readablility" is a goodness measurement of "presentation"
> right?  A model wouldn't have to be readable right?
> 
> There are a whole range of tradeoffs possible.  A presentation-free model
> exists at one end... pure presentation details is at the other. Inasmuch
as
> we've already chosen a midpoint, I see nothing absurd about discussing
> possible surrounding midpoints.  In order from presentation-free to
> presentation-full here are some points in the continuum (choices we could
> make for our document/message meta-structure):
> 
> * a binary structure -- you've got to read the schema (not included) to
> understand the structure
> * a text-encoded structure that's fairly readable but which carries no
> "markup" save positional and nesting markup (think LISP S-EXPR's) --
you've
> still got to read a schema to know what's going on
> * some sub-XML type thing that's e.g. less verbose -- now you have "tags"
> but perhaps structure end is a generic close delimiter -- not a repeated
tag
> name.
> * XML -- now you've got tag names, but in what language -- you've already
> picked a presentation right?
> * XML with "human readable" code list values -- again, in what language
> 
> If we're going to go for a clean separation of "model" from "presentation"
> the I'd guess that we'd want to e.g. use numbers to identify code list
> values in a language-neutral way.  That being said I find it a little bit
of
> a double standard, given that we've chosen XML as a basis (what with all
> it's redundancy-for-the-sake-of-readability)
> 
> I just re-read Phil's message:
> 
> > > The numeric representation of the country identifier
> > > is the only part that we rely on in canonical XML
> > > markup based on the ASN.1 schema. Only the numeric
> > > portion of an abstract value such as "ROM(642)"
> > > need ever appear in the transfer syntax where there
> > > is typically no human reader involved.
> 
> I think the key word is "rely".  The way I would interpret this idea for
UBL
> would be that we would make the "pure identifier" (the numeric one)
> required, and we'd make the textual one ("ROU") optional.
> 
> Seems like we ought to give some guidance that says: "A UBL-compliant
> processor will operate only on the pure identifiers -- not on the textual
> ones".  In what artifact could we make such a statement?  Is that part of
> our charter?  I think it should be.
> 
> > -----Original Message-----
> > From: Eduardo Gutentag [mailto:eduardo.gutentag@sun.com]
> > Sent: Monday, February 11, 2002 11:05 AM
> > To: Phil Griffin
> > Cc: NDR SC
> > Subject: Re: [ubl-ndrsc] [Fwd: Fwd: ISO 3166-1 -- Change of
> > Alpha-3 Code
> > Element for Romania]
> >
> >
> > The separation of content from presentation is fundamental to
> > how XML is supposed to work, either via stylesheets (preferrably)
> > or via the applications themselves.  So I'm not sure what
> > novelty you're proposing. I thought this was a given.
> >
> > Phil Griffin wrote:
> >
> > > FYI. This seems relevant to our work. And I believe
> > > illustrates just how subject to change character
> > > string code lists can be.
> > >
> > > I note that while the numbers are not as expressive
> > > perhaps for human readers, they are stable and work
> > > reliably in software.
> > >
> > > The numeric representation of the country identifier
> > > is the only part that we rely on in canonical XML
> > > markup based on the ASN.1 schema. Only the numeric
> > > portion of an abstract value such as "ROM(642)"
> > > need ever appear in the transfer syntax where there
> > > is typically no human reader involved.
> > >
> > > Perhaps we should consider that the characters to be
> > > displayed or read by an application could be disjoint
> > > from the value actually used for validation purposes.
> > >
> > > That is, I might send you 642 and you might choose to
> > > display "ROMANIA" or "ROM" or "ROU" or nothing based on
> > > you own local copy of the characters that map to these
> > > numeric values.
> > >
> > > Not a perfect solution, just an idea for consideration.
> > >
> > > Phil
> > >
> > >
> > >
> > --------------------------------------------------------------
> > ----------
> > >
> > > Subject:
> > >
> > > Fwd: ISO 3166-1 -- Change of Alpha-3 Code Element for Romania
> > > From:
> > >
> > > Francois Vuilleumier <fvuille@attglobal.net>
> > > Date:
> > >
> > > Sun, 10 Feb 2002 18:38:08 +0100
> > > To:
> > >
> > > moumg@ties.itu.int
> > >
> > >
> > >> From: "Wischhoefer Cord" <wischhoefer@iso.org>
> > >> Subject: ISO 3166-1  --  Change of Alpha-3 Code Element for Romania
> > >> Date: Mon, 4 Feb 2002 10:58:28 +0100
> > >>
> > >> Dear All,
> > >>
> > >> For your information here's the latest on ISO 3166-1:
> > >>
> > >> On request of the Romanian Government the ISO 3166/MA decided to
> > >> change the ISO 3166-1 three-letter (alpha-3) code element
> > for Romania
> > >> from ROM to ROU.
> > >>
> > >> The two-letter code element RO remains unchanged.
> > >>
> > >> The change took effect on 1 February 2002.
> > >>
> > >> Below are the URLs of the two language versions of the ISO 3166-1
> > >> Newsletter V-3 announcing the change.
> > >>
> > >> English version of the Newsletter:
> > >> http://www.din.de/gremien/nas/nabd/iso3166ma/nl_pt1/nlv3e_rou.html
> > >> French version of the Newsletter:
> > >> http://www.din.de/gremien/nas/nabd/iso3166ma/nl_pt1/nlv3f_rou.html
> > >>
> > >> Best regards
> > >>
> > >> Cord Wischh�fer
> > >> ISO 3166/MA Secretary
> > >>
> > >> Tel.: +41 22 749 72 33
> > >> Fax: +41 22 749 73 49
> > >> Email: wischhoefer@iso.org <mailto:wischhoefer@iso.org>
> > >> <mailto:wischhoefer@iso.org>
> > >
> > > Part 1.2
> > >
> > > Content-Type:
> > >
> > > message/rfc822
> > >
> > >
> >
> >
> > --
> > Eduardo Gutentag               |         e-mail:
> > eduardo.gutentag@Sun.COM
> > XML Technology Center          |         Phone:  (510) 986-3651 x73651
> > Sun Microsystems Inc.          |
> >
> >
> > ----------------------------------------------------------------
> > To subscribe or unsubscribe from this elist use the subscription
> > manager: <http://lists.oasis-open.org/ob/adm.pl>
> >
> 
> ----------------------------------------------------------------
> To subscribe or unsubscribe from this elist use the subscription
> manager: <http://lists.oasis-open.org/ob/adm.pl>
> 
> ----------------------------------------------------------------
> To subscribe or unsubscribe from this elist use the subscription
> manager: <http://lists.oasis-open.org/ob/adm.pl>