[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: Re: [ubl-lcsc] Code sets for Document and LineItem Status
I think that Chee Kai is right but that we can't touch this right now. I think that Chee Kai is right because a single code list cannot be straightforwardly represented in XML in a case-insensitive way. XML is inherently case-sensitive. And the reason for that is that the upper and lower case characters are different Unicode code points. The concept of case does not exist in the majority of the world's languages; only in some alphabetic scripts is a semantic identity imagined to inhere in codes that happen to be 64 positions apart. XML does not natively make this assumption. If we were writing the codes in Chinese we would not be talking about this. So the argument for normalizing case assumes that case is significant. However... it follows a fortiori that we have to respect the case that these things have been given by their maintainers. An example near to hand is the language code list (ISO 639). As I pointed out to Ken a few days ago, in the actual printed legally-paid-for paper version of ISO 639, the codes are in lower case. So if we say that the codes are now case-sensitive, that is, that a difference in case signifies a semantic distinction, then that list is going to have to stay in lower case to maintain the semantics intended by the standard. And if we're going to take the opposite position and make the codes case-insensitive, then case doesn't matter and there's no reason to change anything; users can make the case anything they like, and implementers will just have to bit-mask the difference. The suggestion to use numeric codes is interesting but doesn't really solve the problem. The problem is that mnemonic codes are troublesome and represent a significant investment of intellectual effort. Numeric codes eliminate this effort by abandoning the goal of easy recognition by some large body of users. Numeric codes are, in short, an admission of defeat. Maybe in the end that turns out the best we can do, but I'm not yet ready to throw in the towel without considering a couple of obvious alternatives. I will note in passing that the numeric version of a list is actually an entirely separate code whose members just happen to map to the same referents. I don't see a technical difference between a numeric code list packaged with the alphabetic version by ISO and a numeric code list developed by some separate agency with a mandate to resolve its list to the same values. (Hmmm.) I think this means that the alpha and numeric versions of a standard that provides both have to be modeled as structurally distinct lists, and users are going to have to explicitly agree on which of these they are using, just as they would if the lists were maintained by separate agencies. Thus it is demonstrated that in talking about alternatives, we are not in danger of losing the purity of a single code list for each application; the standards bodies themselves have already done that by providing officially sanctioned, logically distinct variants. They already require users to choose among alternatives and convey that decision to their trading partners. Anne says: | After having spent some time looking for code list values on the | web recently I would propose that we want to keep a safe distance | from creating the appearance that we are maintaining code lists in | any way. Some of the suggested changes below may take us too much | in that direction - might make it appear as though we are | maintaining these codes, since it will be obvious looking at the | values that we have changed them. I agree. I think that we should, insofar as possible, be ripping the codes right out of the UN files and pasting those code points verbatim into the code schemas. This should also be the easiest thing to do, which I find a significant point in favor of this strategy. Anne adds: | Hopefully the TC that Jon mentioned will come about sooner than | later. Yeah, but we don't have the resources to be forming another TC right now. We need a set of consistent, royalty-free code lists for basic trade parameters that we can make universally available to support UBL. I can think of several ways to do this legally and relatively painlessly (meaning that someone like me could crank out semantically identical replacements for the lists we've been trying to work with at a rate of 50-100 elements a day) without abandoning mnemonics for a large subset of users. Perhaps something along the lines suggested by Tim: > Fortunately, I think we have some flexibility here. With these > EDIFACT codes, we can be 'based upon' . So a middle ground would > be to use the 'short description' e.g. the words > "Accepted","Conditionally Accepted" and "Rejected". I think this > is better than inventing our own terms. Maybe not that algorithm, precisely, but something like that. It should be remembered that no one can copyright an idea, only its form of expression. There's nothing private about the semantics of these code lists; their meanings belong to everyone. Perhaps this is something that someone could take on over the winter break. How many elements, roughly, would we guess are contained in the standard lists that we've identified as basic to UBL operation (aside from the two that we believe we've been licensed to use by ISO)? What's the size of this task, really? Jon
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]