regrep message

Subject: RE: XML.org implementation (Packages)
From: "Kearns, Una" <una.kearns@documentum.com>
To: regrep@lists.oasis-open.org
Date: Fri, 7 Apr 2000 18:12:15 +0100
Hi,

I have been spending time today reviewing whats already included in XML.org
Catalog and have spent time looking over EBXML again.  

A few areas I would like to address:

I'll start with easiest first!

1.  Classifications

We had a conference call today going back to Taxonomies for XML.org.   We
had been looking at NAICS for classifying registered items but NAICS appears
more suitable for classifying the SO.   

An axis for retrieving/browsing based on SO is often useful - and hence
being able to classify the SO is quite important for this.  This might be
more suitable for XML.org then in general for Reggrep - I'll leave that up
to the list to determine.  
Is there any problem in having extensions for collecting information
regarding SO?

We also looked again at classifications for registered items   --- this I
will address again in a follow-on topic regarding packaging..etc..
The general conclusion at the end of the conversation was to use the
existing homegrown XML.org "application type" as another axis for
classification with the knowledge that this will need to expanded or
replaced over time.  

In reviewing the reggrep list --- I came across the link to xmltree.com
again --- they use the Dewey system as one classification axis --- do we
know did they license this or are they just using it?  I thought the problem
with us using Dewey was in part because they didn't want people just copying
it on the Web?

2.  Classification & Packinging

Below is a partial list of some of the schemas on the XML Catalog today

The pattern I see across these is very famaliar 

I am going to take a subset and organize them along the axis of organization
	Organization

	OASIS		
			Docbook
					Docbook 3.1
								

					Docbook 3.0


			CALS

	CommerceOne

			CBL
					CBL 2.0

							
					CBL 1.0


	DataChannel
			Portal Markup Language PML
							PML 0.9   (XML DTD)



	Visa
			Visa Invoice Specification

						   Visa Invoice 1.0
	
Example Car Rental Invoice 
	
Example Hotel Visa Invoice
	
Example Airline Invoice
	
Style-Sheet
									VISA
Invoice DTD version 1.0 
									Visa
Invoice Implementation Guide  --- PDF
									Visa
General Implementation Guide ---  PDF		

	Voice forum	
		       VoiceXML
							VoiceXML .9
	
VoiceXML  PDF Specification

	FPML Forum
			FPML ...
							FPML Working Draft
1.0b2

	
Overview in PDF
	
FPML Samples in PDF
	
FPML components and DTDs in PDF
	
Zip file of DTDs and Samples
	
Bunches of DTDs and Samples
	      								HTML
of working draft -
	
		

  
In all cases there is really the notion of the literary work as Terry calls
it.

This is what is the classification level in my opinion for XML.org anyway
--- in a taxonomy view you are going to want to get to FPML or VoiceXML and
not overview document in PDF??.   On this level as well you would want to
store information

	Example:
		Home Page ---- for Docbook
					for VoiceXML etc..
					You might also want to store general
information about this example overview document etc..

		What does this map to in current reggrep spec?
data-dictionary-set? 

		Issues here:

			1.  What is the representation  at this level ---
i.e. it does not make sense to say it is an XML-DTD for example if you look
at Docbook within Docbook or CBL there are registered items of many type
i.e. XML-DTD, SOX, etc...

			2.  What happens if you do a retirement or a
retraction at this level in the tree --- does this mean that instruction
applies to everthing below it.   I think it must or the other way to do it
is not to allow a retirement or retraction if something has siblings ---
i.e. DEDS or Data elements in a "contained" relationship

			3.   I also think these nodes in the tree for
example:  --- Docbook and Docbook 3.1 -- are really just containers or
packages i.e. they do not have contents directly they contain
data-dictionaries or data-elements and have related items.   In all these
cases it will be very important to be able to add, retract, and update
components in these containers and also to update the information on these
containers    
			I also feel that when you submit a data-dictionary
that you must specify at least one "container" or data-dictionary-set that
it belongs to.  

			4.  What would be the representation for Docbook 3.1
-- again this is a container that has documentation, sytle-sheets,
SGML-DTDs, and XML-DTDs.  A better example is probably the Visa example ---
the implementation guides are almost just as important as the DTDS.


If we are saying that DTDs, Implementation guides etc.. map to
data-dictionaries then we need one other level which may be the already
data-dictionary-set that contains data-dictionaries and also
data-dictionary-sets.

Thanks,
Una		
								

			
				

				
	



-----Original Message-----
From: Terry Allen [mailto:tallen@sonic.net]
Sent: Tuesday, April 04, 2000 1:03 PM
To: regrep@lists.oasis-open.org
Subject: Re: XML.org implementation (Packages)


Len wrote:
| All,
| 
| I'm confused both by Terry's example and by Nagw's response to it.  The
| entire correspondence is included below.
| 
| First - take Terry's example:
| 
| > Please refer to my initial post on "the simple case".  The fact
| > that several things happened to be submitted together doesn't
| > mean that it's useful to maintain that association.  In other
| > words, if my DTD has parts A, B, C, D, and E and I happen to submit
| > A and B together, then later C, D, and E, the useful set is A-E,
| > not (A, B) and (C-E).
| 
| It's not clear if Terry intended to say DED for <data-element-dictionary>
| or if he really means DTD for <data-type-definition>.  But I'm having

DTD.  Docbook comes in half a dozen modules.

| trouble with either interpretation.  Let's assume DTD for now.  Suppose he
| submits the <data-element>s for parts A and B together as a
| <data-element-dictionary>, call it DED1.  Then all three items A, B, and

Each module is a d.e.d.; they aren't submitted together as a d.e.d.

| ...  At this point the DTD that uses all 5 parts is still not
| registered.  If the DTD "uses" all 5 parts, it couldn't be registered with
| parts A and B because its "uses" associations with C, D, and E wouldn't be
| valid (unless they were registered in some other registry/repository).

The DTD is *composed of* all five parts.  Now, in my Docbook examples,
I also showed the whole set of Docbook parts registered together,
db7.txt.  I was simplifying in the example that confused you.

| However, its <data-elemnt> with the DTD metadata could have been submitted
| along with parts C,D, and E in the <data-element-dictionary> DED2, or it
| could be submitted separately.  It each case the <data-element> for the
DTD
| would be assigned an identifier and its metadata would declare a "uses"
| association with the five parts A-E, or it could declare a "uses"
| association with the package DED1 and items C,D, and E, or if submitted
| separately, it could declare a "uses" association with packages DED1 and
DED2.

The essential points here are:  1) as Len indicates, the declared
associations show what goes together with what (the accident of
submitting things together does not), and 2) we need some notion
of a DTD apart from its modules.  In my message "Literary Work,"
I gave this view:

| REGISTRY
| 	List of Subject Areas
| 		Aircraft
| 		(etc.)
| 		Computer Documentation
| 			IBMIDDOC
| 				IBMIDDOC 1.0 and related data
| 				IBMIDDOC 1.1 and related data
| 			(etc.)
| 			Docbook
| 				Docbook 3.1 and related data
| 				Docbook 4.0 and related data

and asserted:

| The List of Subject Areas is clearly a classification scheme, it clearly
| isn't owned by the SO for Docbook, and it may or may not be owned by
| the RA.  I'd call it a taxonomy.  Computer Documentation is a node in that

| taxonomy.  IBMIDDOC 1.0 (for those who've never heard of it, it's a 
| documentation DTD from IBM) is a d.e. dictionary, as is Docbook 3.1.  
| (In both cases, the related data isn't part of the dictionary.)

In the case of Docbook 3.1, the entire distribution could be considered
the storage entity most closely associated with the line

| 				Docbook 3.1 and related data

or you could take the view that docbook.dtd, which is the "driver"
file that calls all the other modules, is the storage entity most
closely associated.  But actually, I think, you'd want to see under 
that line,

					entire distribution
					docbook.dtd
					dbpool.mod
					dbhier.mod
					related data

So maybe "entire distribution" should be regarded as a data-element-
dictionary-set.  

| A request to retrieve the registered item DTD would get a single document
| type definition that references 5 other registered items, but it would not
| receive the other 5 items themselves.

That would depend on whether you wanted to retrieve the entire 
distribution (as you probably would when browsing) or the docbook.dtd
file (as a result of a call from a parser).  Both choices are 
reasonable under different circumstances.

My point in the example that confused you was that it should not
matter if I submit docbook.dtd and dbpool.mod one day, and dbhier.mod
and other modules the next:  remember, I'm not taking "submission
package" as meaning the same as "set of all associated registered
items" (I had previously used "set of all related data", but that's
something different still).

An RA *could as a matter of policy* require a set of all associated
items to be submitted as a single submission package, but that would
not extinguish the distinction between submission package and set
of all associated items, it would only treat them as functionally
equivalent for the purpose of that RA's workflow.

That this policy would be difficult to live with can be seen more
clearly when one considers related data.  If I submit a DTD and
documentation, then add related data, such as a FAQ, examples,
and so on as they're created later - perhaps even new DTD modules -
I have no particular interest in dealing with the original
submission package boundaries.

| A request to retrieve the registered item DED2 would get items C,D, and E,
| and possibly DTD if the <data-elemnt> describing DTD was included in DED2.
| However, in no case would it be possible to request a single registered
| item and get everything!

In the example that confused you, right (unless the RA created a
package of them); in my Docbook examples, no, you could request
the entire distribution.

| A request to retrieve the item DTD and all of the items it "uses" would
| retrieve 6 items, i.e. the DTD and the 5 parts. But this requires a
| recursive search on the part of the registry/repository down the "uses"
| association tree for DTD - I see this as a feature of a "good"
| registry/repository.

Certainly the "uses" associations need to be arranged so as to make
this possible.

| In this example Terry is making the point that DED1 and DED2 really have
no
| relevance to the DTD and its 5 parts, but the registry/repository doesn't
| know that.  The repository has no choice but to register DED1 and DED2 and
| keep any metadata that was included with them.  Una would say - let's not
| register DED1 and DED2 and lets not make it possible to associate any
| metadata with them.  That's OK, except then we'd have lost the capability
| to register packages of elements and to reference the package with a
single ID.

Yep.  And that's why we probably need the notion of a literary work,
which both RA and SO can use to draw a boundary around "Docbook DTD",
and why we probably need to consider Docbook 3.1 as a d.e.d.-set.

| Next -lets consider Nagw's interpretation of Terry's example.  I think
Nagw
| was assuming that Terry wanted to register a package with 5 parts, not a
| DTD with 5 separately specified and registered sub-elements. Here's what
he
| says:
| 
| >I think we are in the same page here, 
| >There will be a unique identifier for the DTD, and a unique identifier
for 
| >every 
| >component from A-E. 
| >We will use the DTD unique identifier to group all related submission ,
i.e 
| >(A,B) with (C-E)
| >When you retrieve the DTD , it will includes A-E. 
| >You can also retrieve each component separately.
| 
| Even under the assumption that we're interested only in the 5 parts, not a
| document that uses the 5 parts, and that Terry meant to say DED instead of
| DTD in his example, things don't happen as Nagw expects.
| 
| A request to retrieve item DED1 will return only items A and B.
| 
| A request to retrieve item DED2 will return only items C, D, and E.

Items DED1 and DED2 (A,B) and (C-E) are not registered items, so
you'd never request their retrieval.
...

| CONCLUSION
| 
| What Nagw really wants here, I'm infering, is the ability to register a
| package of items and then add other items to the package later.  This will

I certainly want that.

| be very difficult to accommodate unless we specify a DTD for a submission
| to a registry that allows one to amend a previous submission.  We do not

No, submission is related to workflow, not to registered items.  We
need instead to be able to specify associations, and associations
to larger entities than a given registered item (and those associations
aren't yet in data-element-association-list.ent).  Those larger
entities keep getting confused with "submission packages", but they're
different.

regards, Terry
Follow-Ups:
- Re: XML.org implementation (Packages)
  - From: "james@xmlTree.com" <james@xmltree.com>