ubl-hisc message

Subject: Re: Fwd: [ubl-hisc] Proposed XPath information instance
From: "Stephen Green" <stephen_green@bristol-city.gov.uk>
To: <UBL HISC <ubl-hisc@lists.oasis-open.org>>
Date: Tue, 01 Mar 2005 14:00:38 +0000
Ken

Hi. Please find my further responses inline.

> I've been working on exposing cardinality in elements and attributes, and I 
> decided it would not be proper to add information to the former ".xml" 
> versions of the "complete" instances.  Then, there would be XPath 
> attributes in the instance that are not part of the actual document type.
> 
> So, I've added all of the cardinality information from the model to an 
> instance I'm calling the "XPath instance".  Below you will see the first 50 
> lines of the Order XPath instance and a sample stylesheet for Micah to 
> extract the same kind of XPath report that he requested.
> 
> Now the question for the committee:  should I reflect cardinality in the 
> text and HTML presentations?  For example, here are the first 10 lines of 
> Order.txt:


> 
> Z:\data\KenData\dev\xsd\UBL-1.0-SBS-0.5\xpaths>head Order.txt
> 2   /po:Order/po:BuyersID
> 2.1 /po:Order/po:BuyersID/@identificationSchemeAgencyID
> 2.2 /po:Order/po:BuyersID/@identificationSchemeAgencyName
> 2.3 /po:Order/po:BuyersID/@identificationSchemeDataURI
> 2.4 /po:Order/po:BuyersID/@identificationSchemeID
> 2.5 /po:Order/po:BuyersID/@identificationSchemeName
> 2.6 /po:Order/po:BuyersID/@identificationSchemeURI
> 2.7 /po:Order/po:BuyersID/@identificationSchemeVersionID
> 3   /po:Order/po:SellersID
> 3.1 /po:Order/po:SellersID/@identificationSchemeAgencyID
> 
> To introduce cardinality I could add a column of Kleene operators between 
> the reference key number and the XPath address:
> 
> 2     /po:Order/po:BuyersID
> 2.1 ? /po:Order/po:BuyersID/@identificationSchemeAgencyID
> 2.2 ? /po:Order/po:BuyersID/@identificationSchemeAgencyName
> 2.3 ? /po:Order/po:BuyersID/@identificationSchemeDataURI
> 2.4 ? /po:Order/po:BuyersID/@identificationSchemeID
> 2.5 ? /po:Order/po:BuyersID/@identificationSchemeName
> 2.6 ? /po:Order/po:BuyersID/@identificationSchemeURI
> 2.7 ? /po:Order/po:BuyersID/@identificationSchemeVersionID
> 3   ? /po:Order/po:SellersID
> 3.1 ? /po:Order/po:SellersID/@identificationSchemeAgencyID
> 
> The above doesn't look like much since the attributes are all optional and 
> the SellersID element is optional as well.  Note how the document element 
> is not optional, so no Kleene operator shows, just a space.  For required 
> attributes I would also leave it blank.  For "one or more" elements I would 
> use "+" and for zero or more elements I would use "*".  If the maxOccurs is 
> a hard number rather than "unbounded" I would use "#" which is not a Kleene 
> operator, but gives the indication that the reader should check out the 
> actual XPath instance where the real number is found.
> 
> ACTION: please offer your opinion if I should leave text and HTML files the 
> way they are already, without cardinality, and just leave cardinality in 
> these new XPath files ... or should I add the visual indication of 
> cardinality into the text and HTML files?

...

> 
> Okay I've just added attribute type information to the new XPath detail 
> instance.  An example is below.
> 
> At 2005-02-26 21:41 -0500, I wrote:
> >Now the question for the committee:  should I reflect cardinality in the 
> >text and HTML presentations?
> >...
> >ACTION: please offer your opinion if I should leave text and HTML files 
> >the way they are already, without cardinality, and just leave cardinality 
> >in these new XPath files ... or should I add the visual indication of 
> >cardinality into the text and HTML files?
> 
> I'm leaning towards "yes" on the above action.
> 


I think sufficient if the cardinality is placed in the XML details files but there is one
reason I consider for reflecting cardinality here: it seems possible that
the 2.1, 2.2, etc might be mistaken for cardinality and so adding cardinality here
with an appropriate notation might help clarify things. However, using Kleene
single character notation rather than the more familiar 1..n, 0..1, 0..n
(and in one case - AddressLine/Line - 0..7 or 1..7) seems less clear and
I'd favour the 1..n, etc (for the sake of the less technical readers of the specs).




> Oh, note below that I've documented the vocabulary as a comment at the 
> start of the detail instance.  I can easily replace the one-character names 
> with more meaningful names, because the reports are short (totalling less 
> than 300K) for all of the SBS subsets.  However, these eight reports for 
> UBL CD2 total 17Mb in size, so I thought it important to be parsimonious 
> ... then I thought when the files are that big, who cares that they are bigger?
> 
> So I'm leaning towards changing the vocabulary names to be more 
> verbose.  What do you think?

Naming whether verbose or not is still, I think, likely to need to be flexible
and changable. That said, verbose/human readable is usually considered
the way to go with XML and I'd keep to this trend. In practise the names
should get hidden along with the XML behind a transformed output for human
consumption but technical architects and designers may have to read the XML
and shouldn't be frightened off with having to refer to a separate key in order to 
understand it minimally. My assumptions are that there will be developers, etc
reading this stuff who may not have time to read long extra documentation to
find out what the 'a', 'e' mean. It seems obvious to us in context that they
are attribute and element but out of context we might stumble a bit so
'attribute' and 'element' would probably be better. I'd keep a little where
appropriate to the UBL NDR like having lower camel case for attribute names
and upper camel case for element names. That just helps consistency with
UBL (but another has noted that we're not perhaps in a context where UBL
NDR is appropriate, but some of the rules are very generally acceptable guiding 
principles like avoidance of acronyms and abbreviations and use of Oxford
English Dictionary names). 

Other things to improve extensibility and future changes of the file format might
be to create a Schema for the details and have it be easily customized but that
could come later perhaps.

> 
> Next ... what to call these "detail" files.  The text file and the HTML 
> file are obviously "XPath files" because the enumerate all of the possible 
> XPath paths.  The XML instance file is an instance of the vocabulary that, 
> while not valid, instantiates one of everything, every element and every 
> attribute, without any other elements or attributes so is somewhat 
> "pure".  We can call these "instance files".
> 
> The naming convention for the above three files follow: Order.txt, 
> Order.htm, and Order.xml.
> 
> For now I'm calling the detail files along the lines of Order-xpath.xml ... 
> but now we have two files using the XML extension ... which makes sense 
> because they are both XML.  But, would it help if we called the file 
> Order-detail.xml since there is only the raw material for an XPath file, 
> not an enumeration of the XPaths themselves.  This instance is the 
> intermediate instance used to produce the three other output files.  Until 
> now I've not been publishing it, but it now has the cardinality and type 
> information not found in the XPath reports.
ws?
> 
> I'm leaning towa
As I said earlier, I consider that there is a very happy overlap here with
the forming Small Business SC in that the Small Business Subset needs
an appropriate developer/machine readable normative format for defining
the subset. In that case I'd tend to term the file here termed a 'detail file'
as a 'subset definition file' but of course from HISc's point of view it isn't
just a file for subsets but could be used for the superset. In that case
'data definition file' is what I might call it or more specifically 'subset data
definition file' (this is because it is similar to a 'schema definition language'
file but not quite the same and not to be confused with the same).
This would lead me towards a more simlar name to the above term when
naming the actual individual file. I'd keep to the .xml extension to help
tools know how to open it. Then I'd think of something like, let me think...

say for an Order subset data definition file... maybe OrderSubsetDefinition.xml
when it is the subset (SBS say) used and then for correctness OrderDefinition.xml
when it is the superset


> 
> The detail file has an "e" element (for an element) and child "a" elements 
> (for attributes) in context of other "e" elements.  Lots of repetition, but 
> once you navigate down to an element in context you can determine all of 
> the information offered about that element without going 
> anywhere.  Information on every element and attribute in every context is 
> available in the file.  From this I produce the other reports.
> 
> Should we use a different extension and call the file "Order.???" so that 
> we have four unique extensions?  Should we call it "Order-detail.xml" or 
> "Order-xpath-detail.xml" as better names?

see previous answer above

> 
> I'm not sure where I lean ... though I like the idea of four unique 
> extensions, I hate to join the legions who have dreamed up new TLAs for 
> XML-oriented vocabularies.  Anyone care for "XDI" for "XPath Detail 
> Instance"?  "XPD" for "XPath Detail" See?  These don't really do it for 
> me.  And, anyway, Google reveals that both TLAs are in use for XML 
> vocabularies.  Perhaps by now all TLAs with an X are in use for XML somewhere.
> 
> Though I think it "proper" that ".xml" is suitable because it's how you use 
> it, not what it says, I've been having file management issues moving files 
> around when two types of files have the same extension.
> 
> Please offer your opinions on these simple housekeeping questions.  I don't 
> want people to live with what I've decided if they can think of better ways 
> to express concepts I'm trying to express.
> 
> Thanks!
> 
> .................... Ken
> 
> Z:\data\KenData\dev\xsd\UBL-1.0-SBS-0.5>head -50 xpaths\Order-xpath.xml
> 
> 
> 
>     
>        
>        
>        
>        
>        
>        
>        
>        
>     
>     
>        
>        
>        
>        
>        
>        
>        
>        
>     
>     
>        
>     
>     
>        
>        
>        
>        
>        
>        
>        
>        
>     
>     
>        
>     
> 
> Z:\data\KenData\dev\xsd\UBL-1.0-SBS-0.5>
> 
> --
> World-wide on-site corporate, govt. & user group XML/XSL training.
> G. Ken Holman                 mailto:gkholman@CraneSoftwrights.com 
> Crane Softwrights Ltd.          http://www.CraneSoftwrights.com/o/ 
> Box 266, Kars, Ontario CANADA K0A-2E0    +1(613)489-0999 (F:-0995)
> Male Breast Cancer Awareness  http://www.CraneSoftwrights.com/o/bc 
> Legal business disclaimers:  http://www.CraneSoftwrights.com/legal
Follow-Ups:
- Re: Fwd: [ubl-hisc] Proposed XPath information instance
  - From: "G. Ken Holman" <gkholman@CraneSoftwrights.com>