OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

ubl-dev message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Moving to a vocabulary / single-structure approach with COI type registry/ontology?


Are we just all moving forward, back to where we were before?

Flexibility was a key factor promoted originally for the adoption of XML ten years ago. 

Today - driven by the focus on SOA - I'm seeing people rediscovering this in its simplest form - returning to POX - plain old XML.

Paradoxically - the lessons from EDI-past may well be set to drive the next iteration of XML format practices.

Having learned the painful lesson of payload bloat - many people are turning
back again to simple XML - POX transaction structures.

On my current project for DHS - one such approach is UMF - Universal Markup Format that provides the data warehouse feeds into and out of IBMs EAS - Entity Analytic Solutions engine.  We also see this approach echoed with Genericode and then the 
latest CIQ V3.0 where attributes are used to denote labels for content values.

So from the EDI perspective this is very reminiscent - you have a single
universal structure layout - into which is laid in the actual content values.
Of course this makes it easier to write processing code engines that 
can accept any transactional content.  Compared to schema however where each
transaction has its own bespoke content model - rather than just one schema sufficing for any transaction - and hence strict content validation relying on import statements redefining restricting type values - it is more limiting on expressing heirarchical relations.

What makes this simple POX work however is now the availability of the CAM template validation - which excels at such cross field validations and structure rules.  Again - the CAM template needs just one universal structure layout - but can apply rules contextually based on actual transaction types - to verify content.

So is it time to marry proven EDI techniques to new XML approaches?

IBM's UMF has several severe issues - which is good news - because lessons
learned from such practical project work are always invaluable.  So OASIS
has the opportunity to define a new UMF that provides quality integration
exchanges with simple adoption and maintenance.  Of course such UMF syntax
is an 80:20 solution targetting typical supplychain use patterns.  It's 
not intended to replace traditional schema techniques everywhere - and
especially when the rich heirarchical tools in full XML are needed.

Mostly however in eBusiness people want simple, reliable and quick.  Not
having to re-invent XML structures and handling is a huge plus.  Instead people can
take a standard UMF and simply apply their information content to that. And know the code they already have written will simple process it - in and out.

Again this is nothing new - I wrote recursive COBOL for EDI for Sealand and they
marvelled at the fact that two control files would format any message
they wanted automatically - no need to change the COBOL code - reducing weeks of software development to hours.  And that was 15 years ago - so doing this with XML and Java is if anything even easier.  Or using JavaScript and HTML in a RIA.

So what does UMF look like?  Here's a strawman to work from.

Conceptual level:

 UMF
  Header
  Package (Entity)*
   Entity ( (Number_Content)* | (Code_Content)* | (Attribute_Content)* | Entity ) *
  Footer (Entity)*

and then 

 Number_Content ( (label, value)* ) // where value is a number e.g. integer, decimal, date, time, credit card, insurance number, etc

 Attribute_Content ( (label, value)* ) // where value is string text, address, name, product

 Code_Content ( (label, value)* ) // where value is code - enumerated lists, tokens, etc. 

Of course none of this is anything radically new - just a case of formalizing this with a minimum set of perscribed content - such as what constitutes a valid header, 
and then letting people provide their own patterns from there.

This definately is way more flexible than old style EDI - (more akin to what
the original IMPDEF work in EDI was attempting to achieve of self-defining transactions).  And of course the power here is that any standard XML tooling will find this very easy to process automatically.

Here's a sample of a simple UMF XML instance:

<UMF>
 <HEADER>
   <ATTRIB LABEL="MSGTYPE" VALUE="UBL-PO"/>
   <NUMBER LABEL="VERSION" VALUE="3.5"/>
   <NUMBER LABEL="DATE" VALUE="2007/11/27"/>
   <CODE   LABEL="DATAMODEL" VALUE="US"/>
 </HEADER>
 <PACKAGE>
   <ENTITY TYPE="PRODUCT">
    <NUMBER LABEL="QTY"  VALUE="1"/>
    <NUMBER LABEL="WGHT" VALUE="12.75"/>
    <ATTRIB LABEL="DESC" VALUE="Fuel Pump"/>
    <CODE   LABEL="PRDT" VALUE="17-A19-04"/>
   </ENTITY>
   <ENTITY TYPE="DELIVERY ADDR">
    <ATTRIB LABEL="STREET" VALUE="1920 High Street"/>
    <ATTRIB LABEL="CITY"   VALUE="PARIS"/>
    <CODE   LABEL="STATE"  VALUE="OH"/>
    <CODE   LABEL="ZIP"    VALUE="30781"/>
   </ENTITY>
   <ENTITY TYPE="DELIVERY">
    <NUMBER LABEL="SHIP_BY"   VALUE="2007/11/29"/>
    <CODE   LABEL="DELV_MODE" VALUE="OVERNIGHT"/>
   </ENTITY>
 </PACKAGE>
 <FOOTER/>
</UMF>

Of course the other thing about this is flexibility because it is
self-describing and predictable.

Apply an xslt to this - and you can morph it to/from existing UBL with
ease... or a form layout, or HTML and so on.

Also - notice - no need for typing information - that is handled by
your vocabulary definitions - and your external templates - such 
as CAM template and COI Registry definitions of your DRM.  This 
minimizes the "throw-weight" of the markup in the transactional XML itself.

Then also the CAM template contains the XPath rules that ensure the
required tags that each ENTITY type needs are there correctly.  Those CAM rules in XPath can be applied by any implementation - such as JavaScript or Ruby or xslt, et al.

As a social comment of where we've got to today - too often we see XML transactions that are 3500 bytes - that contain only 10 or 20 bytes of actual data!!! It's hard to promote efficient exchange paradigms - when your throw weight to content ratio is 350 to 1 bytes!

For example you've probably seen the sort of XML with 1,000 bytes of namespaces and URLs in the XML prolog and root tag, then 2,000 bytes of structural markup of tags within tags within tags, with empty tags (but gotta be there to pass schema validation), then another 500 bytes of hdr/body tags for the transport level packaging for good measure to keep the WSDL happy...

Is there a simpler direct way here for everyone to embrace?

DW

"The way to be is to do" - Confucius (551-472 B.C.)



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]