OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

ubl-ssc message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [ubl-ssc] Complete summary of SSC F2F discussion on EF/SS/NDR/Schemaalignment


Hi,

I'd like to get this out to the TC by the end of the day Monday so any TC level items can be presented/discussed on the Wednesday call.  Please let me know if you have any comments on this summary by noon tomorrow San Francisco time (about 18 hours from now) so I can incorporate and send out by the end of the day.

Thanks,
Anne

Anne Hendry wrote:
Hi SSC-ers,

Here is a summary of our discussions in last week's UBL F2F meetings around alignment of the SS models, EF capabilities and requirements, schemas, and NDRs.  These discussions happened over the course of the 4 days, so were fairly in-depth.  This is the first pass summary, and some of the summary material overlaps as it was raised from several different angles over the course of the discussions.  After review from those who participated I will consolidate the overlapping areas into a more succinct set of conclusions, decisions, and action items.

Note that some of the decisions regarding changes to Spreadsheets or schemas have already been implemented.  We will review those change logs next to make sure we have a set of new ss and ef changes for schemas that we are agreed on so we can do the check of round-tripping.

If you have comments on the material please include a reference to the section and subheading with your comments.


Thanks,

-Anne

EF - Schema - SS alignment -------------------------- 4 parts to this document: A. Preliminary minor changes to begin EF/SS alignment test (5) B. Changes to spreadsheets requested by EF (13) C. Additional items raised in above two discussions (2) D. SS/NDR/EF alignment changes (25) E. A few follow-up questions (3) ----------------------------------------------------------------- A. Preliminary minor changes to set base for EF/SS alignment test ----------------------------------------------------------------- Changes have been made to both 1.0 schemas and 1.0 spreadsheets. In order to do an alignment test we should begin with a known state, namely the schemas and ss that were released with 1.0. However, there are some known minor issues which effect instances that can be taken care of straight away: 1. In the SDT spreadsheet there is a trailing space in the value for the codeListURI fixed attribute of the Country codelist. This was carried forward into the schemas by EF and is present in 1.0 schemas. It would be preferrable for EF to add a feature to do general trimming of white space (doesn't do this right now) and we must fix the SS. There was discussion of having EF replace this one trailing space (in the codelistURI only), perhaps by escaping it with something like '%SPACE%' in order to maintain backwards compatibility for the instances. However, the final day of the Plenary the decision was to issue another CD called 1.01 with the space removed. A contributing factor to this decision is that the RFC that covers URIs says ALL spaces/tab/newline etc should be removed, so properly constructed URIs should not have an issue with this. AI: Remove trailing space from SDT spreadsheet for 1.01 AI: Perhaps it should also be a 1.0 release notes matter - since it effects validity of instances people should know about it. Add to 1.0 errata. AI: Add space trimming in EF (for 1.01, or 1.1?) 2. The Numeric Type does not have the "Format" attribute in the CCT or UDT schemas, but it does appear in the spreadsheets. [8.11.04 Anne: This is also the case for DateTime and Indicator.] NOTE: Reintroducing format attribute to Schemas, say in 1.1, would require a change from simpleType to comlexType. Change for 1.01? Will require tools and schema changes. AI: Remove format attribute from spreadsheets (1.01). AI: Add errata for 1.0 AI: Consider the impact of removing 'format' and using simple types. 3. Three attributes for each of the code types used in the SDT SS (codeListNamespacePrefixID, codeListDescription, CodeListCredits) are not represented in the schemas. AI: Decide if we need attribution (and other header comment items). AI: If not, remove attributes from SS for testing. AI: Figure out a better way to represent these going forward (1.1). EF suggests columns for these. 4. In the SDT spreadsheet, both the 'content' row and the 'name' row contain the filename of a possible codelist text file in the 'Values' column. This appears as though we are asking for it to be a fixed value of these (content/name) attributes. This is not incorporated as such into the schemas, but since it was not intended it to be used as a literal value, but rather as a filename for further processing, this creates an inconsistency in how the spreadsheets should be interpreted. The SDT spreadsheet should have the codelist text file filename removed from the 'name' attribute row, at least. Schema generation for the 'content' row can possibly use this value. At this time, however, EF doesn't use the sdt spreadsheet at all. Work is underway in GEFEG to be able to use these SDT SS values. There is work to be done in improving algorithm for importing this. Will be done in a couple of weeks. AI: Need to align SDT SS, Codelist model and EF import functionality. Need further consultation between ef and ubl to get this done. Dependency on completion of Codelist model. 5. For Unspecialized Datatype "Binary Object", and all objects based on it ("GraphicType", "PictureType", "SoundType", and "VideoType") the schemas are missing attributes "format" and "mimeCode". Instead they have "characterSetCode". Format and mimeCode are present in the corresponding spreadsheet models, although filename, encodingcode, and uri are missing in both udt ss and schema. AI: The spreadsheets could be corrected for the alignment test, but this should be sorted out for 1.1 or even 1.01. --------------------------------------------- B. Changes to spreadsheets requested by GEFEG --------------------------------------------- GEFEG would like to see the following changes to the UBL spreadsheets to make them more importable and usable to EF. UBL-CoreComponentTypes-1.0.xls ------------------------------ 1. The spreadsheets 'Component Type' column (column X) uses only 'DT' or 'Supplementary Component'. EF also needs to distinguish content components from supplementary components so would like to add 'Content' to the possible values for the 'Component Type' column. So possible values for this column would then be one of: "CCT", "Content", or "Supplementary" AI: Agreed to change CCT spreadsheet to use 'Content' value. 2. The order of the supplementary components for "Code. Type" and "Identifier. Type" should be aligned with the one used in the CCT schema and in CCTS 2.01. AI: Agree to fix in SS. However, will not formalize sequence of bbies in ss model as those are ordered for human readability. [8.11.04 Anne: What about BinaryObjectType - neither the schema nor ss are in CCTS order.] 3. Because CCTS 2.01 doesn't know a "Property Term Possessive Noun" or "Property Term Primary Noun" EF is only able to store these entries as user notes. The reason behind disctinction of 'Possisive' and 'Primary' is that some names are two part names (two nouns) but one does not qualify the other. We use many two-noun terms (eg. StreetName, IssueDate, and BuildingNumber). Mike Adcock recognized a pattern in this and wanted to insure we consistently used both names, so this was a useful tool to ensure a more consistent pattern of naming and terminology, to show explictily when a PT was qualified vs. when the additional noun was just part of a two-part name. This has always been useful - to have as extra info for modeling even though these two pieces of information do not appear in the schemas. And this is the biggest benefit of using SS for modeling. Another example is UBL name and DEN name in the SS - even though they are not cannonical, they are a really good indicator to the modeler. The problem comes when we ask EF to store and regenerate info that is not needed by EF, but is needed by our SS model. We might use the Options file. Then EF could regenerate the info, but still not the formulas, which is the biggest downfall of an EF-generated SS. GEFEG has offered solution to put these into the comment area, but then they lose their real value because we'd never look at them. We could remove the content from these columns and then add more columns at the end. However, the columns themselves definitely need to remain in the current place, even if they are empty. AI: Sylvia find out if EF can preserve those columns (even if there are values in them, and even if it can't preserve the values). Then if EF can regenerate those columns on output (even without the values) we'll just regenerate the values manually (by adding the formula). Regardless of how EF manages those columns internally, the output UBL SS format (columns and column order) needs to remain as it is in 1.0. UBL-UnspecializedDatatypes-1.0.xls ---------------------------------- 1. In order to distinguish content components from supplementary components it is needed to change "Supplementary" to "Content" in column "Component Type" if it is a content component. So possible values for this column would then be one of: "DT", "Content", or "Supplementary" AI: Agreed to change CCT spreadsheet to use 'Content' value. 2. The order of the supplementary components for "Code. Type" and "Identifier. Type" should be aligned with the one used in the CCT schema and in CCTS 2.01. AI: See #2 above. 3. Because CCTS 2.01 doesn't know a "Property Term Possessive Noun" and "Property Term Primary Noun" we are only able to store these entries as user notes. AI: See #3 above. 4. UBL models Secondary Representation Terms (Graphic, Video, Date, Time, etc.) as being of the same Object Class as their respective Primary Representation Term, but with the Object Class qualified by their respective Secondary RT name. Example: Graphic type is of object class 'binary object' with an object class qualifier of 'graphic'. EF handles unspecialized DTs as unqualified DTs and so doesn't expect an object class 'qualifier' for these DTs. In EF, unqualified DT components should not have an Object Class Qualifier, and EF has no way to store qualifiers of 'unqualified' DTs. So that information is not being used in the schemas. AI: See #5 below and #4 under 'UBL-SpecializedDatatypes-1.0.xls'. 5. Because of #4. EF has problems creating the Dictionary Entry Names for Supplementary Components of unspecialized DTs that represent a Secondary Representation term. CCTS 2.01 doesn't specify how to generate DENs for SCs or Secondary RTs. Therefore EF can't build up these names from ccts rules. EF disregards the qualifier and uses the Secondary RT type name as the name, as it would for a Primary RT (Graphic SCs would be prefixed with 'Graphic'). UBL has implemented the Supplementary Components for both Primary and Secondary Representation Terms as components of Unspecialized Datatypes following the same naming rules as for CCTs (and CCs and BIEs). That is the ISO 11179 ObjectClass+PropertyTerm+RepTerm and seems a logical approach. So UBL models the DEN of Graphic SCs as 'Graphic_BinaryObject', 'Graphic' being the Object Class Qualifier of 'BinaryObject'. AI: We need to resolve this difference in naming of SCs of Secondary RTs. If we determine we need DENs for these Supplementary Components then we should agree on how best to model these and what rules should be applied for their naming, as there are currently no rules for DEN creation of Content Components and Supplementary Components of Secondary RTs in CCTS or UBL. AI: David explain how the names for CCs and SCs for secondary RTs are currently generated for EF. AI: EF suggest other way to model secondary RTs. The DENs do need to be in the EF internal model for the CCTS (and by inheritance for UBL). The UBL approach is in alignnment with other CCT implementations so is not a requiremient for UBL only. UBL-SpecializedDatatypes-1.0.xls -------------------------------- 1. In order to distinguish content components from supplementary components it is needed to change "Supplementary" to "Content" in column "Component Type" if it is a content component. AI: See #1 above. 2. The order of the supplementary components for specialized DT that are based on "Code. Type" should be aligned with the one used in the CCT schema and in CCTS 2.01. AI: See #2 above. 3. Because CCTS 2.01 doesn't know a "Property Term Possessive Noun" and "Property Term Primary Noun" we are only able to store these entries as user notes. AI: See #3 above. 4. In order to handle the Dictionary Entry Names for Content Components and Supplementary Components as in the Spreadsheet some work in EF is needed because so far there are no rules for DEN of CC and SC. UBL uses DENs right now because it's part of compliance with ccts - every component need DENs. The Supplementary Components for CCTs and primary are defined in CCTS, but others are not. UBL has applied the same principle of naming Primary RTs to the naming of Supplementary Components of Secondary RTs. By implication, this is required by ccts, otherwies you could argue there are no rules for Supplementary Components. So UBL has made the decision to apply same rules to SCs and CCs. EF has made a different decision. AI: Need UBL rule (not NDR) which would be an implementation of the CCTS naming Secondary Representation Terms. Should here try to align this with the ATG2 expression of what they call the UDTs. Tim will look at this. UBL-Reusable and MainDoc ------------------------ 1. Because CCTS 2.01 doesn't know a "Property Term Possessive Noun" and "Property Term Primary Noun" we are only able to store these entries as user notes. AI: See #3 above. ----------------------------------------------------- C. Additional items raised in above two discussions ----------------------------------------------------- 1. GEFEG would like the UBL SS at minimum to have the same columns as the TBG17 SS. [9.11.04 Anne: We need to get from GEFEG a list of the columns from the TBG17 SS they want to see in the UBL SS.] 2. The ss and schemas of cctypes are currently quite different because EF is not reading the CCT spreadsheet - the CCT schema is generated manually. It's not clear whether or not we will continue to provide a cc types schema. This needs to be decided going forward. AI: Decide on continued UBL distribution of CCTypes. -------------------------------------------------------------------- D. SS/NDR/EF alignment changes - based on NDR V1.0 Draft Candidate 1 -------------------------------------------------------------------- [GXS1] UBL Schema MUST conform to the following physical layout ... ------ UBL schema organization is different than GSX1: - short copyright not the same - full copyright should be at end of document - need to align order for declaration of namespaces and order of imports and follow structure outlined in GSX1 - include section head comment lines, except when section is empty GXS1 doesn't include, but UBL comment header currently includes: - "Universal Business Language (UBL) Schema 1.0" - URLs to UBL and OASIS web sites - "Document Type" - "Generated On" (date) - tribute to Mike - additional comment lines for additional clarity AI: This needs to be reviewed by TC and aligned for 1.1. [GXS6] The xsd:final attribute MUST be used to control extensions ------ AI: Recommend to remove as this is already an xsd tenet. No need to restate here and confusing where to apply. Or possibly move to CM document. [NMC1] Each dictionary entry name MUST define one and only one fully ------ qualified path (FQP) for an element or attribute. EF doesn't explicitly check this. The correlary, which is possible duplicate DEN's (or UBL Names) for some objects is also a concern. Should EF explicitly check these things? AI: Clarify whether there's need for EF to check this. [MDC1] UBL Libraries and Schemas MUST only use ebXML Core Component ------ approved ccts:CoreComponentTypes. The UBL CCT schema implements ebXML approved cctypes according to CCTS Table 8-1, with three exceptions: numeric, datetime, and indicator. The UBL CCT schemas do not contain the 'format' attribute for these three types. These have been cast as 'simple' types (which precludes adding more attributes). AI: Mark: We should feed this back to ccts (the fact that we're not using CCTs as they were designed). [VER1] - [VER7] Relating to use of major/minor version numbers. --------------- There is nothing in EF to automatically create version numbers. Now it is done manually; should EF consider automating this? AI: Raise to larger group. [SSM10] The ubl:CommonAggregateComponents schema module MUST be named ------- “ubl:CommonAggregateComponents Schema Module” [SSM12] The ubl:CommonBasicComponents schema module MUST be named ------- “ubl:CommonBasicComponents Schema Module” [SSM14] The ccts:CoreComponentType schema module MUST be named ------- “ccts:CoreComponentType Schema Module” [SSM17] The ccts:UnspecialisedDatatype schema module MUST be named ------- “ccts:UnspecialisedDatatype Schema Module” [SSM19] The ubl:SpecialisedDatatypes schema module MUST be named ------- “ubl:SpecialisedDatatypes schema module” - Need clarification on where these terms are to be used. - The plurality of the word 'Type' in the module name for SSM19 doesn't agree with that of of SSM14 and SSM17. UBL implements this word as a plural for all 3 cases (agrees with SSM19, but not SSM14 or SSM17). - Should there be rules for the CCP also? AI: Submit comment to NDR to align, then follow rule(s). Impact on implementations? [DOC1] The xsd:documentation element for every Datatype MUST contain ------ a structured set of annotations in the following sequence and pattern: • ComponentType (mandatory): The type of component to which the object belongs. For Datatypes this must be “DT”. • DictionaryEntryName (mandatory): The official name of a Datatype. • Version (optional): An indication of the evolution over time of the Datatype. • Definition (mandatory): The semantic meaning of a Datatype. • ObjectClassQualifier (optional): The qualifier for the object class. • ObjectClass(optional): The Object Class represented by the Datatype. • RepresentationTerm (mandatory): A Representation Term is an element of the name which describes the form in which the property is represented. • DataTypeQualifier (optional): semantically meaningful name that differentiates the Datatype from its underlying Core Component Type. • DataType (optional): Defines the underlying Core Component Type. UBL supplies only the mandatory set (ComponentType, DEN, Definition and RepresentationTerm). Even though the SS have Object Class and Object Class Qualifier, EF can't create these optional information items because there is no definition for what is the Object Class and Object Class Qualifier for a datatype in CCTS. -> David check above statement. See related discussion in EF/SS summary. [10.11.04 Anne] Rule S28 of CCTS says that DTs must include Qualifier Term (mandatory), but DOC1 has it as optional. EF doesn't manage Version numbers. Should it? AI: Take to NDR for clarification/resolution. [DOC2] A Datatype definition MAY contain one or more Content ------ Component Restrictions to provide additional information on the relationship between the Datatype and its corresponding Core Component Type. If used, the Content Component Restrictions must contain a structured set of annotations in the following patterns: • RestrictionType (mandatory): Defines the type of format restriction that applies to the Content Component. • RestrictionValue (mandatory): The actual value of the format restriction that applies to the Content Component. • ExpressionType (optional): Defines the type of the regular expression of the restriction value. See Table 7-1 of CCTS. Examples of a CC RestrictionType for, say, 'String' type would be 'minimum length'. The RestrictionValue would be the actual value. There must be the above structured set of annotations for each restriction. Currently UBL has no documentation for Content Components or Supplementary Components. AI: Review implementation to see if we need to add anything. [DOC3] A Datatype definition MAY contain one or more Supplementary ------ Component Restrictions to provide additional information on the relationship between the Datatype and its corresponding Core Component Type. If used the Supplementary Component Restrictions must contain a structured set of annotations in the following patterns: • SupplementaryComponentName (mandatory): Identifies the Supplementary Component on which the restriction applies. • RestrictionValue (mandatory, repetitive): The actual value(s) that is (are) valid for the Supplementary Component. Don't know where to find this information. Not in CCP. AI: Take to NDR for clarification/resolution. [DOC*] Eventually registration of constructs in schemas should ----- be automated so can be submitted to registration authority and metatdata will automatically go into the registristrion process for the schemas. AI: Follow up on registration requirements (CCTS Section 7). [GNR1] UBL XML element, attribute and type names MUST be in the English ------ language, using the primary English spellings provided in the Oxford English Dictionary. AI: Terms come from SS, so need to check SS(s). [GNR4] - [GNR6] Acronyms and Abbreviations --------------- EF checks against NDR, but if Acronym is in SS already then it is left alone. DUNS not used. Acronym for DUNS not completely specified. AI: Resolve A&A list, usage, ownership, and maintenance. AI: Align SS and Schemas with final list and rules. [GNR7] UBL XML element, attribute and type names MUST be in singular ------ form 1356 unless the concept itself is plural. AI: SS issue - check SS. [ELN4] A UBL global element name based on a qualified ccts:BBIEProperty ------ MUST be the same as the name of the corresponding xsd:complexType to which it is bound, with the qualifier prefixed and with the word "Type" removed. It could be that there are elements whose names consist of qualifier property term, property term, and representation terms that refer to a complex type with the name having only the property term and representation term. AI: David check correctness of above statement of current situation. AI: Check that SS/EF follow rule. [ATN1] Each CCT:SupplementaryComponent xsd:attribute "name" MUST be ------ the Dictionary Entry Name object class, property term and representation term of the ccts:SupplementaryComponent with the separators removed. Examples: Amount Currency.Identifier -> amountCurrencyID Measure Unit.Code -> measureUnitCode If the object class is identical to the RT of the data type (or cct or whatever) then Object Class is removed from name. EF and SS do the same thing, which is not as it says in this rule. Suggest changing this rule to say: "If the Object Class of the Supplementary Component is identical to the Primary Representation Term of the datatype of the cctype then the Object Class will be removed." This is how cct ss, sdt and udt is probably done. What do rules for SS use for UBL name? ELN3 covers elements, but not attributes. AI: Review for SS and take to NDR for resolution. [STD1] For every ccts:CCT whose supplementary components map ------ directly onto the properties of a built-in xsd:Datatype, the ccts:CCT MUST be defined as a named xsd:simpleType in the ccts:CCT schema module. Need rule to say which cct should use this rule. Needs analysis - originally done by Gunther and Garret and reflected in cct and is now further refined. UBL will have moe ccts and representation terms that will drive the need for more dts. As those ccts and dts are defined, someone will have to do that analysis of taking a new cct and looking at available built-in dts to see if one meets the requirements. This is the responsibility of ATG in CEFACT. If UBL eliminates our udt and cct, then that analysis would be done exclusively in ATG. AI: TC issue. [CTD1] For every class identified in the UBL model, a named ------ xsd:complexType MUST be defined. Example: <xsd:complexType name="BuildingNameType"> What is a 'class' as used in this rule? Should say ABIE? AI: Send to NDR. [CTD7] Every unspecialised Datatype must be based on a ccts:CCT ------ represented in the CCT schema module, and must represent an approved primary or secondary representation term identified in the CCTS. Need clarification of what is meant by 'must be based on'. Simple type doesn't restrict underlying cct, but restricts directly the buil-int xsd types. Should be a rule covering this. Not all udts are direct restrictions. There is a difference in approach between two groups (atg, ubl). In ubl, the udt schema module directly imports the cct shcema module and every dt has a direct 1:1 realtionship with its corersponding cct. In ATG, to make tool development easier, every cct in is defined as a complex tpe and every sc is present as an attribute of that cct. Then, also to levearege buil-in dts, there had to be a break in the direct link between cct and udt (because you can't turn a complex type into a simple type). So that is what is meant by saying the constructs in udt are 'based on' cct. Some are simple types where as others are facets of built-in xsd dt representations. AI: Need resolution for longer term (1.1, 2.0) [CTD17] Each ccts:SupplementaryComponent xsd:attribute ------- user-defined xsd:simpleType MUST only be used when the ccts:SupplementaryComponent is based on a standardized code list for which a UBL conformant code list schema module has been created. [CDL5] The name of each UBL Code List Schema Module MUST be ------ of the form: {Owning Organization}{Code List Name}{Code List Schema Module} This syntax is very strange. Where is this name used? UBL uses a completely different naming convention. Both the code list declaraion and data types are in the code list schema files now. What should be in a CL file? There was intended to be a section in the schema format (as per GXS1) for code lists but this is not there right now - somehow gone. The CDL5 name relates to any time where you must refer to the code list, such as in the header or comments of the Code List or other schema files or documentation. It probably would be best to use this for the 'filename' part of the urn as well, but haven't gone there yet. Will have to look into this later. [CDL*] Code List rules. ------ AI: Revisit CTD17, CDL5 and all other Code List rules with new code list model. ----------------------------------------------------------------------- E. [9.11.04 Anne] This text appears in the NDR after SSM10: "By design, ccts:CoreComponentTypes are generic in nature. As such, restrictions are not appropriate. Such restrictions will be applied through the application of Datatypes. Accordingly, the xsd:facet feature must not be used in the ccts:CCT schema module." But it seems we do restrict (and extend) our cc types in the UBL-CoreComponentTypes-1.0.xsd. [10.11.04 Anne] Regarding [DOC4] The xsd:documentation element for every Basic Business Information Entity MUST contain a structured set of annotations in the following patterns: • ComponentType (mandatory): The type of component to which the object belongs. For Basic Business Information Entities this must be “BBIE”. • DictionaryEntryName (mandatory): The official name of a Basic Business Information Entity. • Version (optional): An indication of the evolution over time of the Basic Business Information Entity. • Definition(mandatory): The semantic meaning of a Basic Business Information Entity. I don't see that we have any documentation for our BBIEs, at least not in the CBC schema. Is this where it would be? [10.11.04 Anne] Regarding [DOC2], why can't we use content component restrictions to limit the allowed values of a code list?



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]