[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]
Subject: Re: [ubl-lcsc] [QA Team] Feedback on Methodology Paper
Please find attached the proposed (draft) for our imminent release. can i get a quick QA on it so far? specifically the methodology section. NB the links wont work yet! Lisa-Aeon wrote: >Marion, >Here is the feedback from Matt on the Methodology paper. Matt brings up >some interesting thoughts. > >I would like to put this on the agenda for the next QA Team Meeting. > >Lisa > >----- Original Message ----- >From: "Matthew Gertner" <matthew.gertner@acepoint.cz> >To: "Lisa-Aeon" <lseaburg@aeon-llc.com> >Sent: Monday, January 06, 2003 8:54 AM >Subject: QA Feedback > > >Lisa, > >I am attaching a slightly marked up version of Tim's methodology paper. >I made a few editorial changes. In general it was unclear to me who the >intended audience of this document is and what it current stage of >development is. It appeared to me that the entire text is right now only >introduction for some expose to come of the actual UBL methodology. Is >this correct? > >I would almost tend to say that the text is too long and not entirely >focused on the problem at hand. It is an interesting and enlightened >discussion of what is meant my document engineering, but do we need this >as a UBL deliverable? Certainly I would personally be more interested in >understanding the exact structure of the LC SC spreadsheet, what fields >mean what, examples of particularly tricky cases and how they are >solved, reasons why a move to a more database-oriented format is felt >necessary, etc. > >Cheers, >Matt > > -- regards tim mcgrath fremantle western australia 6160 phone: +618 93352228 fax: +618 93352142Title: Universal Business Language Part 2: Library Content
This document has been prepared to assist parties wishing to comment on the UBL Library. It attempts to explain the various components of the UBL Library Content release and how they fit together to form part of the overall architecture for UBL.
The UBL Library is…
The Library has been designed as a collection of object classes and associations expressed as a conceptual model. Each document type is then assembled from this common model as a collection of business information entities. These hierarchical models are then transformed using the UBL Naming and Design rules (Ref:) into XML Schema syntax. The analysis and design processes developed by the UBL Library Content team is described by in Appendix A.
The UBL library and its design approach has value to both UBL implementors and the broader community. Adopting a formal approach will enable a broader range of interested parties to understand, refine and extend the UBL Library and to develop models for contextualized situations. For example, the document Order Response may have a limited audience, but the re-usable component Party or Item will have relevance to many applications. We are keen to hear from experts who can suggest supplementary components used in the context of their industry or geopolitical environment.
UBL establishes a system for the concrete representation of documents to be used in electronic commerce.
The Library Content part of UBL specifies a library of business information entities to be used in the construction of business documents together with a set of common XML business documents assembled from entities in the library.
XML
XML Schema
CCTS
Normalization: a formal technique for identifying and defining functional dependencies.
Containership: aggregating components (e..g nested elements in an XML schema).
Normalized Model: a representation of normalized data components.
Document Assembly: a description of an hierarchical pathway through a normalized model
Hierarchical Model: a 'tree' structured model that can be implemented as a document schema.
Context: the circumstance or events that form the environment within which something exists or takes place.
Class Diagram
Busines Information Entity
Core Component
Object Class
Property
Representation Term
Type
BIE,BBIE, ABIE, ASBIE
CC, BCC, ACC, ASCC
XSD
UML
[The former “Scope Document” and the 'implementation guidelines' go here. Note that we cannot use “Scope” for the name of this section; that's reserved for the scope statement at number 1 above.]
The current spreadsheet matrix used by UBL has proven the most versatile and manageable in developing a logical model of the UBL Library. However, we have also found it useful to have a view that encapsulates the big picture of the structure of UBL. Therefore, we have included a graphical notation in the form of UML Class Diagrams. Such a notation provides a top-level, exploding view.
The UBL logical model contains enough meta-data to allow the automatic generation of XML Schemas based on the rules of the UBL Naming and Design Rules sub-committee. This process is managed by an automated Perl1 script also included in this realse (Ref: ).
The final artifacts for the UBL Library are the XML Schemas themselves. These represent the physical implementation of the logical UBL models.
Business Information Entities are Core Components of information used in a specific context. UBL assumes each core component has neutral context for a de-contextualized BIE. Or, using reverse logic, we can say that a core component is a BIE without any context. For example, when we identified the BIE ShippingContact and BillingContact, we also identified that these were two different contexts for a Contact. This meant that we had also identified a de-contextualized BIE called Contact. By doing this we avoid the need to define the ‘core’ components separately, they are just BIEs that can be used without any context. In this way, we can still claim that the UBL Library is a set of ebXML compliant Business Information Entities expressed in XML syntax. It is our intention to submit all de-contextualized BIEs as candidate Core Components to the relevant UN/CEFACT group as soon as possible.
[Explanatory text goes here.]
Normative XSD schemas for the UBL documents and core component types are referenced through the identifiers below.
This document describes a methodology for identifying and defining business document library content.
The UBL Library is a collection of business information entities expressed in a conceptual model. These are then transformed using the UBL Naming and Design rules into XML Schema syntax.
The UBL model helps analysts, modelers, domain experts, and others better understand the Library. Any such business data model is developed using some form of methodology. A methodology defines the processes, notations and possible software tools used to populate a meta-model which in turn defines our data model.
This document describes a formal and pragmatic approach to library development based on analysis and design techniques we call “Document Engineering".
If it is to become an international standard for electronic commerce, UBL needs to achieve a critical mass of adoption. To promote rapid and widespread adoption, UBL must accelerate its own Library development work and allow its workload to be distributed to sub-groups and industry verticals. This requires a formalization of the approach UBL takes to identifying and describing the content of its library. We have defined a set of processes, notations and a meta-model document in such a way that they can be used by both UBL members and the broader community. This will enable a broader range of interested parties to understand, refine and extend the UBL Library and to develop models for contextualized situations.
In addition, because of UBL’s objective to synthesize a range of established vocabularies in both the XML and EDI worlds, this approach also includes explicit steps to identify and reuse design patterns and other artifacts of prior modeling efforts.
A methodology defines the processes, notations and possible software tools used to populate a meta-model which in turn defines the artifacts we call our data model. For this reason the Unified Modeling Language (UML) is not a methodology – it defines no processes. However, the UML can be used for its notation and meta-model as part of the methodologies that do define processes, such as the UN/CEFACT Modeling Methodology (UMM).
Methodologies exist whether we define them explicitly or not. An explicit, formal methodology establishes a consistent meta-model – how the information about the model is to be presented. Some formal methodologies may be strict about their process and notation whilst others are best described as providing a do-it-yourself tool-kit approach using optional sets of processes, notation and tools. Each approach has its values.
The strict processes methodology yields consistent and interchangeable models produced from independent sources. These may be useful in situations where there is a requirement for exchangeable business process models. These are often described as having a 'top-down' approach.
The tool-kit approach is more concerned with pragmatic tools for specific uses, for example where a common model is being developed for use by an entire community. These are sometimes referred to as 'bottom-up' or artifact driven methodologies. It is our belief that this approach is more suited to the situation of UBL – rapidly developing a single universal model of a business language.
The essence of Document Engineering is the analysis and design methods that yield formal models to describe the information these processes require. The methods of Document Engineering are practical and effective for both transactional and non-transactional document types. The resulting models must be carefully designed to contain enough structure and semantics to convey meaning while not being so general (as are relational data models) that they allow too many interpretations. These models must also find a balance between the optimal document designs for a business's internal processes and the need for those documents to be understood by other businesses. This tension induces document designers to reuse existing models wherever possible and reinforces them if they describe any new models they create in ways that encourage their reuse by others.
Almost every book about systems design introduces some variation of an Analyze/Design/Refine methodology. According to this classical approach, the artifacts of the "real world" are analyzed and the results of this analysis are represented in a physical model that captures the characteristics of the artifacts as they exist in some context and expressed in a particular technology. Then the model is progressively refined into a more conceptual, logical model by identifying repeating or reoccurring structures, removing redundancies and technology constraints, and otherwise creating a more abstract, concise, and context-free representation of the essential characteristics. Finally, the refined model is implemented "in the real world" by expressing it in technology appropriate for the contexts in which it will be used.
This classical approach is familiar to data modelers but can seem somewhat alien to document analysts. The traditional subject domain for data modelers consists of large numbers of identical instances, so the analysis activity isn't as document driven as it was when there are fewer and more heterogeneous instances to study. It is harder to "let go" of the artifacts and create logical models when the artifacts are more salient. In addition, document analysts, especially those who learned the skill when only DTDs were available to encode models, are more likely to think in terms of the modeling restrictions imposed by this syntax and thus are less inclined to spend significant time refining models at conceptual level.
What we call Document Engineering is at its core a "document-centric" version of the Analyze/Design/Refine methodology.
Documents contain three kinds of information components: content, structure, and presentation. Content components are always the most fundamental, but it is usually important to analyze the associations of the other types of components as well. Structural components like chapters, sections, tables, headers and summaries, usually have some implicit semantic value because of their conventional use to reinforce a logical content hierarchy. A critical task in document analysis is determining the rules by which presentational information identifies or signals components of the other two types, because the visual design of printed or rendered instances can be highly complex.
Fig. 1 illustrates how analysis is the process of identifying and separating these three types of components. It also suggests how the idiosyncratic characteristics of document instances need to be carefully analyzed in order to identify "good" logical components for potential reuse.
In data modeling, because the structural organization of content is more regular and because the binding of presentational information to structure and content is less intrinsic for data-centric document types, analyzing structure and presentation is often seen as an afterthought. This contrast makes the vocabulary and methods of document analysis and data modeling seem more different than they actually are.
Each of the three component types has a different set of principles for achieving a quality design. We focus here on those or content components, and explain concepts and methods that applicable to a broad range of document types.
Content components can be identified at three levels:
The hardest level at which to identify good components is at the aggregate level. There is little doubt that we need some grouping of elements at the sub-document level both in our logical models and our schemas, but if we do this on an ad hoc and intuitive basis we might not identify the optimal patterns for re-use. For example, it might "sound right" to group Name, Address and DateOfBirth into an aggregate component of Person. But what is it about the associations among these three components that makes them into a good aggregate?
The answer comes from conventional data modeling practice, which includes formal rules for designing logical structures and establishing what data analysts call functional dependencies in order to create modular and self-contained groups that lend themselves naturally to re-use. Much of what document analysts have done in the past, albeit informally, is applying similar principles to identify reusable components in logical models of documents. In Document Engineering we make this practice explicit so as to apply the same rigor to document schema design that we have customarily applied to data modeling.
Whilst there is little doubt that we need some grouping of elements (i.e. containers) in our schemas, we have attempted to formalize the identification and design of these groups. Formalization is important to allow consistency and replication of the UBL Library development work. This will enable a broader range of interested parties to understand, refine and extend the UBL Library and to develop content for contextualized situations. Most importantly, correctly formed containers add semantic value to our Library and promote re-usable components.
Our initial discussions identified three types of containers that occur in XML schemas:
These containers provide a wrapper around sets of repeated data structures with differing values. That is, "containers of a series of like elements". For example Line Items on an Order. Each Line Item would have the same structure, such as item number, description, quantity, etc, and there could be many of these per Order.
The container serves to signal the bounds of the list for processing and display purposes. Whenever a data element is defined as repeatable in the logical model, it is possible to wrap it in such a container. This suggests that they are technical, rather than semantic, considerations.
We refer to these lists of repeated elements as ‘List’ containers.
Other containers provide historical formatting to a document. For example, a Header and Summary wrapper may be used to replicate the layout of many common printed Order documents.
We shall refer to these as ‘Presentational’ containers.
Most common in any document are containers that wrap elements having an apparent logical connection to each other. We shall call these ‘Grouped Element’ containers.
Identifying these logical groups allows us to minimize redundancy, localize dependencies and ensure that information can be maintained in logical sets that reflect the constraints of the real world.
Defining the logical grouping of elements in documents is something that can be done intuitively. It might sound right to group Name, Address and DateOfBirth into a Person container. However, if we want to have strongly re-usable structures we need a more formal and consistent approach for grouping elements.
Conventional data modeling practices include formal rules for designing logical structures. In fact, much of what document analysts have done in the past, albeit informally, is establishing what data analysts call functional dependencies – which we will refer to as simply, dependencies. Using these principles we can apply the same rigor to document schema design that we have customarily applied to database design.
Dependency means that if the value of one component changes when another component's value changes, then the former set is functionally dependent on the latter. For each Person we identify, there is a different Address and DateOfBirth component because the values of each of these components functionally depend on the identity of the Person in question.
Technically, this can be defined as:
"Given
an ABIE, called E (e.g. Person), the BIE called Y (e.g. DateofBirth) of E is
functionally dependent on the BIE called X (e.g. Name) of E if and only if,
whenever two instances of E agree on their X-value, they also agree on their
Y-value."
Data analysts use a formal technique for identifying and defining these dependencies, known as normalization. Normalization is a series of analytic steps that:
Normalization yields models that describe the network of associations between logical groups of components in optimal ways that minimize redundancy and prevent inadvertent errors or information loss when components are added or deleted. These are sometimes referred to as Entity-Attribute-association (EAR) models.
For example, an Order may contain many Products (such as seen on a PurchaseOrder document) or a Product may be on many Orders (such as seen on a SalesReport). Normalization would introduce a LineItem component to reconcile these two views.
The UBL Library Content team developed a normalized model for objects in the trade procurement business context. This is represented by both a spreadsheet and UML Class and Dependency Diagrams. We have found that the combination of these presentation forms is necessary to give a complete functional view of the model.
Two-way association patterns like these are common in normalized data models and they provide great flexibility in the way we can maintain our information. They are an attempt to reflect the complex network or web of associations that exist in the real world.
However, when we want to exchange information with others, this flexibility amounts to ambiguity. We do not want to show all the associations among the information components, only those that are relevant to the business context we are in. This context-specificity is best achieved by creating (or assembling) a hierarchical view out of the relational representation. Hierarchical views introduce container structures to impose a particular interpretation on the information we want to exchange.
Of course, we can assemble several alternate hierarchical views of the same relational model, as we saw with the Order and Product (PurchaseOrder vs SalesReport). If we need to create a schema for a PurchaseOrder document type we would start at Order and list all LineItems and their associated Products. If we wanted a SalesReport document type schema we would start at Product and list all LineItems and their associated Order. The contrasting document schemas reuse the same components but assemble them in two different container structures, one the inverse of the other.
In data modeling terms, it is at this stage we reconcile the many-to-many and bi-directional associations of our normalized model into one-to-many, single directional pathways.
In this way, the hierarchical view both enforces integrity rules and prevents ambiguity in the meaning of the data. What we are saying when we assemble a hierarchical view is "we want to emphasize one context in which you are to understand the data this way." This additional step of assembling relational components into hierarchical documents to establish context is what makes Document Engineering a distinctive methodology and not just a style of data modeling. Figure 2 illustrates the roles of analysis, model refinement, and assembly in the methodology of Document Engineering.
Once it is assembled by following a one-way path through the normalized model, the hierarchical document model can be directly implemented as an XML schema. This document schema need not show all components and their possible assocations as described in the normalized model, only the ones pertinent to our business context. Put another way, what this means is that logical components are patterns that can be re-used by assembling them into document schemas based on the context of their use.
The UBL Library Content team have assembled document models for several documents used in the trade procurement business context. These are each represented by both a spreadsheet and UML Class and Dependency Diagrams. We have found that the combination of these presentation forms is necessary to give a complete functional view of the model. In addtion we have consoldiated all sub-components of each document type into a shared library. This is to facilitate re-use of common patterns.
Reuse of patterns has the immediate benefit of reduced design and maintenance effort, encouraging and reinforcing consistency and standardization. Effective analysis enables us to recognize when a pattern can be reused, when a new pattern should be created, and what contexts distinguish one pattern from another.
Patterns useful in UBL are found at both the implementation level in the form of XML schema libraries and also at the conceptual level in terms of libraries of models that describe common business components. Re-using patterns at these more abstract levels facilitates interoperability between different technology implementations.
Context is the circumstance or events that form the environment within which something exists or takes place. Recognition of context is an important factor to promote re-use of common patterns using customized refinements. Where we have similar circumstances or events we can use similar patterns of components.
Within UBL it is the context that determines the rules
for assembly of various document types. At present we rely on narrative
descriptions of context. However, the precise
context of a Business Information Entity can be defined by a formal set of
context drivers and associated values.
UBL is following the set of eight context drivers identified by the Core Component Technical Specification. These drivers are currently known as:
Part of the analysis and design of UBL Library components will include the identification and classification of contexts to which the component applies. The UBL meta-model must accommodate values for each of these drivers. The actual values themselves are currently being developed by the UBL Context Drivers sub committee. These context values will be used by the UBL Context Methodology engine to produce customized schemas for specific implementations. Currently, this is scheduled for the next phase of UBL development.
Figure 3. shows how the the various UBL models describe various lavels of context.
The result of identifying and grouping the information components of our business domain was a normalized data model. This describes the Object Classes, Properties and Associations involved in a general trade procurement process (as defined by our business rules and context statement).
This is presented in both spreadsheet form and graphically as both a Class diagram and a Dependency diagram.
Please note this does represent any specific document type. It is the conceptual view of the all the necessary information components involved in any of our business documents. All document types were derived using object classes and associations taken from this model.
The spreadsheet for the normalized data model is referenced through
http://oasis-open.org/committees/ubl/lcsc/0p70/xls/UBL_NormComp.xls
[An explanation of these pieces goes here, including caveats about their non-normative status.]
Spreadsheets for each of the UBL document types are referenced through the identifiers below.
Class diagrams for the UBL documents are referenced through the identifiers below.
Note: This section is a placeholder for materials that will be supplied in the final specification. In the current review cycle, they are scheduled for release after the ASN.1 team processes the normative materials given above. When available, those materials will be found in a supplementary package linked from the UBL Library Content Subcommittee portal at http://oasis-open.org/committees/ubl/lcsc/.
XSL-FO stylesheets for the UBL documents are referenced through the identifiers below.
ASN.1 schemas for the UBL documents are referenced through the identifiers below.
RELAX NG schemas for the UBL documents are referenced through the identifiers below.
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]
Powered by eList eXpress LLC