OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

acxo message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]


Subject: RE: The Simple Case


Hi,

I have spent quite a few hours reading and re-reading ISO 11179 and also the
regrep spec.

Firstly -- I feel we are trying to pigeon-hole what we are trying to
accomplish against a specification that had a different purpose in mind.
>From reading the EBXML mailing list -- I feel that there are people there of
the same opinion.

There are definitley common good concepts in ISO 11179 that apply to
registration and metadata that apply just generally to basic modelling of
information, but there is no-way we can say we are 100% conformant to ISO
11179 and I don't think we would want to be or nobody would use the XML.org
Registry and Repository.

The basic reason why is:

ISO 11179 is about registering and storing data-elements ( and thats it).  A
data element is as they state themselves an undividisable unit of data
examples they give themselves would be: country of origin code, employee
number, an employee lastname, product description, etc...

        They then specify the metadata required for each data-element --
        i.e the type e.g.  integer, allowable values: 1 -100, min and max
storage (e.g. if a string might be 1 char to 250 chars) etc..  

        The metadata set they define makes sense when you are describing
permissable data descriptions and values -- this specification MIGHT make
sense if we were interested in having everyone register every element and
attribute value discretely but even still I think we might have a hard time
modelling XML element/attribute definitions agains ISO 11179.

There are also a lot of examples today of divergence if you compare the
Reggrep DTDs to ISO 11179. 
        ISO 11179 has no notion of registering a document such as a schema
associated with a data-element, the data-element definition is it.
        Certain attributes defined in Reggrep are optional because they
don't make sense for registry schema's e.g. min and max values etc...
        I could go on much further here but I hope the point has been made -
I feel we are almost doing injustice to the ISO 11179 spec by trying to be
conformant as we are not applying it to its purpose --- I think we should
say we are using concepts from ISO 11179 where appropriate. 


So for XML.org/Reggrep
I do not think data-element is the correct term as the definition of
data-elements according to ISO-11179 are not what we are registering (this
then follows for data-element concept, etc..)

I will call this concept for the purposes of here a "registered-item"

Here I am really looking specifically at XML.org

1.  So what is a registered-item or what can be registered on XML.org? (this
can be expanded over time or different registries will want to register
different things)

Below I have cut and copied these sections from previous emails:


>From related-entity-list.ent:

        related-data-group
        | %documentation-genre-list;
        | %other-xsgml-list;
        | %style-sheet-list;
        | schema-home-page
        | distribution-home-page
        | registration-information
        | cover-letter
        | other


        documentation-genre-list is:

        documentation-set
        | documentation-set-information
        | reference-manual
        | user-guide
        | white-paper
        | faq
        | example
        | example-set
        | example-set-information
        | changelog
        | readme
        | email-discussion-list-information
        | tool-information

literary-genre-list is:

        book
        | article
        | recipe
        | %documentation-genre-list.ent;

This is backwards from what I would have expected.  From the
inclusion in related-entity-list.ent of %documentation-genre-list
I would have expected documentation-genre-list.ent to import
%literary-genre-list instead of the other way around.

other-xsgml-list is

        sgml-open-catalogue
        | sgml-declaration
        | public-text

xsgml-entity-list.ent is

        xml-dtd
        | sgml-dtd
        | xml-schema
        | xdr-schema
        | sox-schema
        | rdf-schema
        | sgml-element
        | xml-element
        | sgml-attribute
        | xml-attribute
        | sgml-enumerated-attribute-set
        | xml-enumerated-attribute-set
        | sgml-enumerated-attribute-value
        | xml-enumerated-attribute-value
        | sgml-parameter-entity
        | xml-parameter-entity
        | character-entity-set

Shouldn't relax-schema be in there someplace?

style-sheet-list.ent is:

        style-sheet-information
        | xsl-style-sheet
        | xsl-style-sheet-information
        | dsssl-style-sheet
        | dsssl-style-sheet-information
       


This is really a fundamental question for XML.org ---  

On the Web page we say Schemas and Style-sheets and then list many other
ad-hoc things e.g. Namespaces, Vocabularies, etc.. 

Biztalk today is concentrated on XDR Schemas as we know.  

So lets go through the list of xsgml-entity list:

1. Schemas:

        xml-dtd
        | sgml-dtd
        | xml-schema
        | xdr-schema
        | sox-schema
       | rdf-schema

        |relax-schema

This list makes sense to me but lets look at some of the issues:

1. Same schema registered in multiple "formats"

If I want to register my Shoe schema and I have made it available in all the
above "formats"  - what do I do?    
 
All metadata, classification info, related documentation etc.. is equivalent
except:

*	"format" 
*	URN for schema

A user looking for information would want to browse/search under something
related to Shoes --- discovered an organization that they trusted -- wanted
info about their submittal -- bingo sees it is available in the above and
because there system is using MS -- they choose the XDR-Schema, they could
then download the schema or include the URN for system access. 

We either manage and register them as separate registered-items or allow a
registered-item to consist of one or more items i.e. registered-item becomes
registered-items  

2. Schema made-up of multiple parts

    If a schema is made up of multiple modules --- it will be up to the SO
to register each of them individually.


        | sgml-element
        | xml-element
        | sgml-attribute
        | xml-attribute
        | sgml-enumerated-attribute-set
        | xml-enumerated-attribute-set
        | sgml-enumerated-attribute-value
        | xml-enumerated-attribute-value
        | sgml-parameter-entity
        | xml-parameter-entity
        | character-entity-set



What does it mean to register an xml-attribute and sgml-attribute?    At
this point in time all we accept is either a file to upload or enter a link
to some URI.

       style-sheet-list.ent is:

        style-sheet-information
        | xsl-style-sheet
        | xsl-style-sheet-information
        | dsssl-style-sheet
        | dsssl-style-sheet-information
       


Again the same applies as for schemas.   

Overall question -- can a style-sheet or schema be submitted in a form such
a PDF etc... 

Other Stuff:

I don't believe any of the other information --- documentation, FAQs, etc...
are valid standalone registered items -   I think that they related-docs to
a given registered-item. Sure they have metadata --- name, URN or link but
are supporting of a registered-item.

To be continued.

 

Thanks,

Una




-----Original Message-----
From: Jon Bosak [ mailto:Jon.Bosak@eng.sun.com
<mailto:Jon.Bosak@eng.sun.com> ]
Sent: Thursday, March 16, 2000 10:44 PM
To: xmlorg@lists.oasis-open.org
Subject: Re: The Simple Case


There's a question for Una and Nagwa toward the end of this.

[Terry Allen, back in his posting of last Saturday:]

| Actually, I don't think anything in the current spec prevents
| the simple case from being simple (if we omit the material about
| submissions, which I have suggested in a separate message).  There
| is something that needs to be added, and on one point we have a
| divergence of views.
|
| The thing that needs to be added is a notion corresponding not
| quite to "a principal registered item" (it's pretty hard to figure
| out what that would be for, e.g., the Docbook DTD; suggestions
| welcome)

I would have guessed driver.dtd to be the principal registered
item.  Am I missing something?

| and not quite to "submission": it's the key by which the thing to
| which related data are related is identified for purpose of
| display in the list of related data.

Is this what data-element-concept is for?  Sounds like it from
your documentation:

   data-element-concept

      Part 1, definition 3.3.15: "concept that can be represented
      in the form of a data element, described independently of
      any particular representation." It contains an optional
      binding of object class to property (optional, as these
      concepts have not yet been clarified as being relevant to
      data element dictionaries), a definition-text element (in
      ISO/IEC 11179 the description of a data element concept is
      primary; the names of that data element are secondary, and
      the concept may exist independently of any name), and any
      number of classification elements, permitting the data
      element concept to be rooted in any number of classification
      schemes.

      Part 1, definition 3.3.45: "set of objects. A set of ideas,
      abstractions, or things in the real world that can be
      identified with explicit boundaries and meaning and whose
      properties and behavior follow the same rules." Its value is
      a string (provisionally).

| For the purpose of the dbrelated.txt example I made this the
| entire Docbook distribution, but as noted above, that's a
| simplification, as the set of related data may not be the same as
| the distribution.

I don't see dbrelated.txt on the public page, and I don't seem to
have a working password at the moment, so I don't actually have an
example of this in front of me at the moment; my apologies.

| It's more like a node in a taxonomy (or other classification) of
| DTDs.  We can make this something abstract (that is, not any of
| the registered items).  It would correspond to the notion of a
| literary work: the Bible is a literary work with many versions and
| physical instantiations, none of which is primary in the way a
| book's first edition can be.

Yes.

| We might instantiate this notion as the name of an item in
| a classification scheme, so that "Docbook DTD" would be such
| an item.

Due to my unfamiliarity with the spec, I'm not quite sure what you
mean by this.

| Note that this classification scheme would not be
| the same as the subject matter classification scheme we know
| we need.  That node could then be pointed to, if appropriate,
| in the metadata for registered items relating to Docbook.

[...]

| I understand Una to want it to be possible to have registered
| items that don't have registry entries or metadata, except that
| which is inherited from the submission in which it arrived.  This
| simplification won't do in an implementation of 11179, and I think
| it's not right anyway.  In the 11179 model everything has
| metadata: its SO, its representation, its name, and its name
| context, for starters.  Representation, name, and name context
| cannot be inherited from another registered item, nor from the
| submission.  Una mentioned language during the ACXO meeting; that
| too cannot be inherited.

This seems to be the nut of the problem.

Question for Una and Nagwa: Can we build something into the UI
that makes a submission appear to the user as a structured object
with a common set of metadata and then, behind the curtain, make a
bunch of registered objects whose metadata is populated with
information from the metadata on the original submission?

As the user, I think that I can handle the idea that metadata on a
"subsidiary item" in the intial submission become independently
changeable metadata once all the component pieces have been
checked in.  Let me put it this way: once I check out an item for
maintenance, I've already made the leap to thinking of it as a
first-class object that has become magically changeable
independent of the initial configuration of the submitted package,
so I'm unlikely to be shocked when I discover that this now
independent object has acquired independent metatdata.

| The 11179 model allows everything registered in the registry to
| be dealt with on the same basis and permits assembly of large-scale
| constructs through the concept of related data.  We don't want
| to give that up, and as a group dedicated to Structured Information
| Standards, we sure want to implement ISO/IEC 11179 correctly.  It's
| going to be the foundation of a lot of other work.

I agree.  Once in place, the same mechanism can be used for any
kind of conceptual relationship.  This is good.  And it doesn't
look on the face of it to be all that terribly hard to implement
given the facilities of a good document database system.  Is it?

Jon
[Terry Allen:]

| | >From an initial reading of Terry's message, I get the impression
| | that the submitter in this case has to submit the composite schema
| | needed for automatic validation as a registered item in its own
| | right, even if the separate modules that make it up are also
| | registered items.  Is this correct?
|
|
| Not necessarily (although Bill Smith suggested it a year ago in
| Granada, and we might want to do things this way - it just doesn't
| scale very well).

We might want to reconsider this if it has performance implications.

| In the case of Docbook, a parser that started with docbook.dtd
| would encounter references to the other modules and download them
| as they're encountered (I'll bet nsgmls does this correctly,
| although I haven't tested).  As DTDs should be cached for a much
| longer period than random files encountered on the Web, the first
| use of Docbook should load the application's cache with all the
| required modules, and subsequently all the parts of the DTD would
| be present locally (which is how we use Docbook today).
|
| If you recall using Panorama to parse a TEI document, you'll
| know that successive GETs like this can take a long time,
| depending on the Web weather.

Yes.  In a transaction-oriented b2b environment this should work
pretty well, but it might slow down b2c significantly until we get
really standardized with the forms.  So people may wish to fully
expand the schema and point to it as a single object.  It seems to
me that there might be workflow advantages to tracking a single
file, too.

| If we were to develop an appropriate packaging mechanism, the
| application might download all the modules together, unpack,
| and then start work.  But we don't have that now.
|
| | If this is not correct, I'm going to guess in advance that there
| | is in theory a way to reconstruct the composite schema from a
| | "specification of relationships among related data."  If this is
|
| Yep, that's another approach, although I'm not sure what it
| gains you.
|
| | the case, can there be more than one "specification of
| | relationships among related data," and can there be a standard way
| | of pointing to which "specification of relationships among related
| | data" allows an application in the minimum case correctly to
| | assemble the composite schema?
|
| Yes, you can specify as many relationships as you like.  For some
| reason the current DTD calls these relationships "associations".

Well, but...

What I mean is can you specify how to assemble the schema in the
right order.  Not in a standard way, it seems (though I can make a
private convention that the order of related-data-references
specifies the order in which modules are to be assembled).

... Sorting this out in a place where I can find it, don't mind
me ...

... Actually, there are comments below for Terry.  I will continue
a response to his original posting on The Simple Case in a
separate message.

Looking at data-element.dtd, we have

   <!ENTITY % data-element-content
           "data-element-concept,
           data-element-association-set?,
           representation,
           name-context+,
           (related-data-reference* | related-data-group-reference)"
   >

   [...]

   <!ELEMENT related-data-reference (related-data-reference-label?,
           (uri-reference))>
   <!ATTLIST related-data-reference
           relationship-of-related (%related-entity-list;) #REQUIRED
   >

   <!ELEMENT related-data-reference-label (#PCDATA)>

   <!ELEMENT related-data-group (data-element-reference,
           related-data-reference+)>

   <!ELEMENT related-data-group-reference (uri-reference)>
   <!ATTLIST related-data-group-reference
           relationship-of-related CDATA #FIXED "related-data-group"
   >

>From related-entity-list.ent:

        related-data-group
        | %documentation-genre-list;
        | %other-xsgml-list;
        | %style-sheet-list;
        | schema-home-page
        | distribution-home-page
        | registration-information
        | cover-letter
        | other

documentation-genre-list is:

        documentation-set
        | documentation-set-information
        | reference-manual
        | user-guide
        | white-paper
        | faq
        | example
        | example-set
        | example-set-information
        | changelog
        | readme
        | email-discussion-list-information
        | tool-information

literary-genre-list is:

        book
        | article
        | recipe
        | %documentation-genre-list.ent;

This is backwards from what I would have expected.  From the
inclusion in related-entity-list.ent of %documentation-genre-list
I would have expected documentation-genre-list.ent to import
%literary-genre-list instead of the other way around.

other-xsgml-list is

        sgml-open-catalogue
        | sgml-declaration
        | public-text

xsgml-entity-list.ent is

        xml-dtd
        | sgml-dtd
        | xml-schema
        | xdr-schema
        | sox-schema
        | rdf-schema
        | sgml-element
        | xml-element
        | sgml-attribute
        | xml-attribute
        | sgml-enumerated-attribute-set
        | xml-enumerated-attribute-set
        | sgml-enumerated-attribute-value
        | xml-enumerated-attribute-value
        | sgml-parameter-entity
        | xml-parameter-entity
        | character-entity-set

Shouldn't relax-schema be in there someplace?

style-sheet-list.ent is:

        style-sheet-information
        | xsl-style-sheet
        | xsl-style-sheet-information
        | dsssl-style-sheet
        | dsssl-style-sheet-information

Which I think exhausts the categories of things that can be
related via a related-data-reference to a registered data item;
right?

Jon




[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]


Powered by eList eXpress LLC