[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: Re: Code List Value Validation
Hey Fraser, thanks for picking up on this thread. At 2006-04-07 13:09 +0100, Fraser Goffin wrote: >its been a while since I have had a chance to catch up, so hi. And last week I was on the road consulting, so only now am I getting around to responding. "Hi" back at you! Thanks for your patience. >I have been re-reading your UBL Code List Value Validation Methodology (v0.4) I hope to have a slightly modified v0.5 out when I get some new work from Tony that he had hoped in a UBL teleconference last month to get to work on this month. >again while I've had a few days off (I know I need to get out more >:-), a few questions if I may be so bold :- Oh, please do, Fraser. We need feedback from actual users as I've been addressing the problems from a geek's perspective. >1. How has this work been received in UBL. Is it proceeding as >planned. Do you think there will be any statement on adoption of >this methodology any time soon ? Thanks to Jon Bosak for addressing this in another post: http://lists.oasis-open.org/archives/ubl-dev/200604/msg00008.html >2. A genericode file contains ONE code list at ONE version right ?? Indeed it does. Well, actually, perhaps I see it slightly differently. I see a genericode file as containing a versioned set of codes. The UBL code list context association file will associate a *combination* of a number of sets of codes to make what I see traditionally as a "code list", that being the list of codes available for an information item. Since the code list association file aggregates sets of codes into a single code list for a given context (which may have many or only a single location in a document), this has introduced a distinction between a code list and a set of codes. But that is based on my interpretation of a "code list" which from my outsider (of business) geek role may be incorrect. I had understood a "code list" to be the set of codes applicable for a given information item. For users of genericode files who use a single genericode file for all of the coded values for an information item (quite typical I should think), then yes, a genericode file contains one code list at one version as you ask. But given that I've proposed an information item's code list is the aggregate of a collection of sets of coded values, each set expressed separate genericode files, then a genericode file contains only a portion of a code list (possibly all), at a given version for that portion. >3. In the doc you state that when an enumeration appears WITHIN a >schema, TPs may ligitimately operate a subset since all values are >valid in the full set, but they may not add new values. Indeed I am, because of the absolute necessity that the UBL code list value validation methodology include a "first pass" schema validation of the instance is successful before a "second pass" value validation is even attempted. This is based on the nature of XPath-based context testing: an XPath address knows nothing of a schema and works solely on a well-formed instance of the actual presence of information items in a structured document, not on the possible information items available in the creation of a structured document. Thus, the information items absolutely must be insured to be properly located within a given instance to have the confidence that the XPath addresses being used in the instance will not inadvertently pass assertions based on faulty placement. Without a successful first pass, there is no integrity to the second pass. So, the first pass must have no errors, and there is no distinction between a schema enumeration error and a schema structural error (nor should I think that validators should introduce such distinctions ... an instance is either schema valid or it isn't). Should trading partners attempt to agree upon and express tailored values in the second-pass value validation, these values will prevent first-pass schema validation from being successful. >Am I correct in assuming that the same is true for a code list which >is defined by a standards organisation (not UBL) but which is NOT >embedded within schema ? That is not something I would assume. That would make UBL too constraining for the real business world should trading partners need to go beyond the "initial set" of values supplied by the committee. With the realization that while things may work fine between them, either would have to agree on those same values with any third party wishing to work with their UBL documents. That UBL includes a limited number of enumerated values is a byproduct of the committee decision agreed upon to incorporate UN/CEFACT's expression of these values which happens to have been done through schema validation mechanisms. >As you know in my situation we operate code-lists than are defined >by a standards body and in most cases want to use them in all >situations where there are equivalent semantics (including internal >app integration) rather than create alternative bespoke lists. Indeed ... and trading partners who wish to conform with the standards body perhaps to guarantee blind interchange with another member of the standards body will, therefore, be required to limit themselves to the published sets of standardized values. But by using the UBL methodology, that is a business decision and a clearly-documented technical practice, and the standards body can enforce validation with their standardized sets of coded values in the standardized sets of information item contexts without sacrificing flexibility when needed in exceptions. The methodology will not constrain two consensual trading partners from engaging in using exceptions while still using other read-only artefacts considered sacrosanct. I see document information item values as been interpretive while document information item structures as being rigid. Moreover, the methodology will also allow trading partners to subset sets of coded values or use different sets of coded values in different information item contexts without violating schema validation in ways not offered by schema expressions. Note that I am not advocating that a maverick user attempt to engage in blind interchange with a suite of values beyond the standardized set. A community of users has standardized a set of values because of a community-wide agreement upon the semantics represented by those values. Trading partners can agree upon the semantics represented by extended values. Maverick users cannot impose unknown or unaccepted values upon unsuspecting recipients. Thankfully, recipients who publish their acceptance of standardized values can use the artefacts published by the standards committee as the basis on which their systems validating acceptable input are built. >Problem is, we do sometimes want/need to extend these lists often to >provide higher fidelity mapping to our operational systems. Indeed. >Another example might be where a code identifies some high level >semantic, but we want to be able to create a bunch of 'sub' codes to >provide a more granular view - accepting that in 2-way translation >there will be data loss. Absolutely. >Lobbying the standards body and getting a timely change/addition can >be problematic ? - anyway - I digress :{)} Indeed ... but any tardiness on their part will not prevent consensual trading partners from engaging how they wish. >4a. For code lists where there is no established [complete and/or >definative] standard or where the semantics and values are TP >relationship specific, the set of permissible values can be extended >and/or restricted from an offered base set (if available - using >your DocumentStatusCodeCodeType example) or the participating >organisations can agree the set of values (and presumably the list >ID to be used in XML instances). Is this correct ? Absolutely. If UBL left each and every coded information item totally empty, then there would be no out-of-the-box experience for inexperienced users looking to UBL for a starting point. >4b. If I have many TPs and each has a slightly different >relationship, might this cause me to need a separate genericode file >for EACH code list that differs, however slightly, from another ?. >Is there a suggested low maintenance approach to this problem ? How about a core genericode file for the common bits, a differential genericode file for the deltas, and use multiple IDREF references in the code list context association to pull in the aggregates? This would allow you to version the common core separately from the differential bits. >4c. Similarly to (4b.), if a custom code list is shared across >service contracts for multiple TP relationships, but a need arises >to create a new version with [say] some values added or removed, and >we need to be able to operate both versions concurrently for some >period of time, does this require a complete re-statement of the new >code list (in a new .gc file) with a new version number even if the >difference is ONE codified value (added/deleted/changed) amongst a >set of 10,000 values ? Hmmmmmmmmm ... probably ... I don't believe there are any operators expressed in genericode. But what if you were synthesizing your genericode as an XQuery result? You could manage your many values in database tables and have the query pull out what you needed based on your criteria and the query result would be the XML instance suitable for use in the methodology. >This also means that the implementation will repeat a lot of code. I >guess I am wondering whether there is/should be a way of expressing >a 'delta' of values ? An interesting issue ... there are XML delta expressions out there ... perhaps one could express the differences as an operation against the XML syntax. But my gut feel is that it would have to be managed somehow at source and emitted as an XML instance. I wonder if Tony can comment on his experiences with user requirements in this area. >5. Continuing the theme of (4), we have some code lists which are >both highly volatile (values added and deleted (rarely changed) >every month) and are very large (e.g. > 50000 entries). An example >is Vehicle Make/Model. Do you think this approach is suitable for >this type of reference data (multiple 'active' versions, large >number of values) ? Indeed ... such values are now most likely somewhere in a database. Rather than implementing a methodology that incorporates direct database access to get at those 50,000 values, XML makes an idea bridge of a concrete expression of the values *at a point in time*, and the metadata for that export would express some identifying information for audit and tracking purposes (perhaps a "version" number of the data set?). Then the XML instance evaporates at the end of the process and the next time you want the values you do your XQuery to get your XML expression of the values, using genericode as the vocabulary. >6. Can you explain the difference between the UBL 'CodeType' and >'IdentifierType' in terms of what circumstances you would use either ? That is an NDR issue and I'll confess that I do not know the nuance off the top of my head. >If a schema identifies the ListID, "schema" or "instance"? >but we want to use a different one (to employ as 'richer set' of >values) , how would an industry standard schema accomodate this >possibility such that the schema remains a standard and unchanged >definition of the structural constraints (is this the >CodeType/IdentifierType approach) ? Sorry, not sure. Could you elaborate on what you are trying to ask here? >7. Do you think that a skeleton context association file could be >auto-generated ? Absolutely ... I did (though straightforward it was a bit of a challenge): http://www.oasis-open.org/archives/ubl/200602/msg00069.html The "Garden of Eden" approach to the UBL NDR requires every element and data type be global ... because of that I was able to use XSLT to process the schema expressions. I've successfully generated the XPath files for all of the document types, and I used the XPath file results to synthesize the above single "default" context association file for UBL 2. BTW, XSLT was so slow at some of the processes, I ended up rewriting some of the steps in Python/SAX. >8. Changing tack slightly. I am interested in using genericode files >for a number of purposes including, value-based validation, UI >generation (e,g, to populate UI controls such as list boxes), >transcoding between application specific codes. It would appear from >the genericode materials that this would be feasible, do you agree ? :- Absolutely! I chair the UBL HISC subcommittee (Human Interface Subcommittee) and genericode files are absolutely appropriate for defining drop-down lists, etc. But not just in their isolation ... context is also important when drop down lists need to differ for different document contexts. We'll probably still be using the code list context association file and not just raw genericode files. Sorry, Fraser, I'm not sure what original text you had below before it was mangled by a mail system somewhere: >Std Code Std Desc Appl'n A Equiv Appl'n B Equiv UI Text >(key) (key) (key) > >abc Std >Widget def ghi Part No >3321-7 (small widget) > >9. What is the suggested approach to deal with deprecated code >values. Is this considered as a versioning issue both for standards >based code-lists (embedded in schema or not) and custom code lists ? My gut feel is yes. >Should code lists include validity date/time values or other >'active/deprecated' indicators ? Tony, can you comment on which semantics in genericode might satisfy this requirement? >10. Caller assertion of list version. If there is no matching >version is it best to flag the validation failure (and possibly >reject the message) - that is, 'trust' the caller assertion, or >validate against the un-vesioned complete list (similar point to the >one we discussed earlier about whether to to an xsi:schemaLocation >attribute value) ? I decided not. The way I implemented this is that if the instance doesn't state a version number for the coded value, the version number isn't important to the author of the instance, and is therefore not validated. If, however, the instance does state a version number for the coded value, the implementation requires the version number to match. I think this is an acceptable conclusion: if I use a value and I don't care in the instance about which version of the list this value is from, then the version of the list being compared against is ignored. But if I use a value and I declare the value is from a particular version, it might be because the semantics behind that particular value from that particular version are important to me. I'm not sure ... have I answered my question. >11. Devils advocate: Whats the difference in having to distribute >the latest .gc file versus having to use the latest XSD with updated >embedded enums ? (Ok, I think I know the answer to this one, but it >would be good to have a quote from 'championing' designer, for the >benefit of my peer group and sceptical and untrusting bosses :-) That the structural integrity of UBL isn't being changed by changing a bunch of allowable values and therefore shouldn't have to require the redistribution of schemas that dictate the allowed structures. Programs are built assuming both structural expectations on information and value expectations on content. Making a change to recognize new structures is more difficult, time consuming and error prone than making a change to recognize a new value in a given structure. While you implied there wouldn't be structural changes in a new schema with updated embedded enumerations, the version of the schema would be new. If I claim my software supports a given version of the schema, I would probably have testing and other issues for new schemas to be installed in my system. My gut feel is that I could more easily mitigate the impact on my system by only needing to accommodate new a new version of a set of values than having to prove my system can handle a new version of a schema. >Anthony: So that you are aware, I am attempting to stimulate >interest in the use of genericode within the organisation that I >work with (a large UK financial services company) from a number of >potential perspectives. One of these is value-based validation, >hence discussions with Ken i.r.o his work with UBL, but also for a >more broadly accessible resource for reference data used for a >variety of purposes such as UI generation, transcoding, etc.. Kewl! I hope this has helped, Fraser. Thanks for all the feedback ... keep it coming! We need it! If you need anything I've said above explained, please ask ... it is late and I may have not caught everything. . . . . . . . . . . . Ken -- Registration open for XSLT/XSL-FO training: Wash.,DC 2006-06-12/16 Also for XML/XSLT/XSL-FO training:Birmingham,England 2006-05-22/25 Also for XSLT/XSL-FO training: Copenhagen,Denmark 2006-05-08/11 World-wide on-site corporate, govt. & user group XML/XSL training. G. Ken Holman mailto:gkholman@CraneSoftwrights.com Crane Softwrights Ltd. http://www.CraneSoftwrights.com/u/ Box 266, Kars, Ontario CANADA K0A-2E0 +1(613)489-0999 (F:-0995) Male Cancer Awareness Aug'05 http://www.CraneSoftwrights.com/u/bc Legal business disclaimers: http://www.CraneSoftwrights.com/legal
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]