[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]
Subject: RE: The Simple Case
Hi, I have spent quite a few hours reading and re-reading ISO 11179 and also the regrep spec. Firstly -- I feel we are trying to pigeon-hole what we are trying to accomplish against a specification that had a different purpose in mind. >From reading the EBXML mailing list -- I feel that there are people there of the same opinion. There are definitley common good concepts in ISO 11179 that apply to registration and metadata that apply just generally to basic modelling of information, but there is no-way we can say we are 100% conformant to ISO 11179 and I don't think we would want to be or nobody would use the XML.org Registry and Repository. The basic reason why is: ISO 11179 is about registering and storing data-elements ( and thats it). A data element is as they state themselves an undividisable unit of data examples they give themselves would be: country of origin code, employee number, an employee lastname, product description, etc... They then specify the metadata required for each data-element -- i.e the type e.g. integer, allowable values: 1 -100, min and max storage (e.g. if a string might be 1 char to 250 chars) etc.. The metadata set they define makes sense when you are describing permissable data descriptions and values -- this specification MIGHT make sense if we were interested in having everyone register every element and attribute value discretely but even still I think we might have a hard time modelling XML element/attribute definitions agains ISO 11179. There are also a lot of examples today of divergence if you compare the Reggrep DTDs to ISO 11179. ISO 11179 has no notion of registering a document such as a schema associated with a data-element, the data-element definition is it. Certain attributes defined in Reggrep are optional because they don't make sense for registry schema's e.g. min and max values etc... I could go on much further here but I hope the point has been made - I feel we are almost doing injustice to the ISO 11179 spec by trying to be conformant as we are not applying it to its purpose --- I think we should say we are using concepts from ISO 11179 where appropriate. So for XML.org/Reggrep I do not think data-element is the correct term as the definition of data-elements according to ISO-11179 are not what we are registering (this then follows for data-element concept, etc..) I will call this concept for the purposes of here a "registered-item" Here I am really looking specifically at XML.org 1. So what is a registered-item or what can be registered on XML.org? (this can be expanded over time or different registries will want to register different things) Below I have cut and copied these sections from previous emails: >From related-entity-list.ent: related-data-group | %documentation-genre-list; | %other-xsgml-list; | %style-sheet-list; | schema-home-page | distribution-home-page | registration-information | cover-letter | other documentation-genre-list is: documentation-set | documentation-set-information | reference-manual | user-guide | white-paper | faq | example | example-set | example-set-information | changelog | readme | email-discussion-list-information | tool-information literary-genre-list is: book | article | recipe | %documentation-genre-list.ent; This is backwards from what I would have expected. From the inclusion in related-entity-list.ent of %documentation-genre-list I would have expected documentation-genre-list.ent to import %literary-genre-list instead of the other way around. other-xsgml-list is sgml-open-catalogue | sgml-declaration | public-text xsgml-entity-list.ent is xml-dtd | sgml-dtd | xml-schema | xdr-schema | sox-schema | rdf-schema | sgml-element | xml-element | sgml-attribute | xml-attribute | sgml-enumerated-attribute-set | xml-enumerated-attribute-set | sgml-enumerated-attribute-value | xml-enumerated-attribute-value | sgml-parameter-entity | xml-parameter-entity | character-entity-set Shouldn't relax-schema be in there someplace? style-sheet-list.ent is: style-sheet-information | xsl-style-sheet | xsl-style-sheet-information | dsssl-style-sheet | dsssl-style-sheet-information This is really a fundamental question for XML.org --- On the Web page we say Schemas and Style-sheets and then list many other ad-hoc things e.g. Namespaces, Vocabularies, etc.. Biztalk today is concentrated on XDR Schemas as we know. So lets go through the list of xsgml-entity list: 1. Schemas: xml-dtd | sgml-dtd | xml-schema | xdr-schema | sox-schema | rdf-schema |relax-schema This list makes sense to me but lets look at some of the issues: 1. Same schema registered in multiple "formats" If I want to register my Shoe schema and I have made it available in all the above "formats" - what do I do? All metadata, classification info, related documentation etc.. is equivalent except: * "format" * URN for schema A user looking for information would want to browse/search under something related to Shoes --- discovered an organization that they trusted -- wanted info about their submittal -- bingo sees it is available in the above and because there system is using MS -- they choose the XDR-Schema, they could then download the schema or include the URN for system access. We either manage and register them as separate registered-items or allow a registered-item to consist of one or more items i.e. registered-item becomes registered-items 2. Schema made-up of multiple parts If a schema is made up of multiple modules --- it will be up to the SO to register each of them individually. | sgml-element | xml-element | sgml-attribute | xml-attribute | sgml-enumerated-attribute-set | xml-enumerated-attribute-set | sgml-enumerated-attribute-value | xml-enumerated-attribute-value | sgml-parameter-entity | xml-parameter-entity | character-entity-set What does it mean to register an xml-attribute and sgml-attribute? At this point in time all we accept is either a file to upload or enter a link to some URI. style-sheet-list.ent is: style-sheet-information | xsl-style-sheet | xsl-style-sheet-information | dsssl-style-sheet | dsssl-style-sheet-information Again the same applies as for schemas. Overall question -- can a style-sheet or schema be submitted in a form such a PDF etc... Other Stuff: I don't believe any of the other information --- documentation, FAQs, etc... are valid standalone registered items - I think that they related-docs to a given registered-item. Sure they have metadata --- name, URN or link but are supporting of a registered-item. To be continued. Thanks, Una -----Original Message----- From: Jon Bosak [ mailto:Jon.Bosak@eng.sun.com <mailto:Jon.Bosak@eng.sun.com> ] Sent: Thursday, March 16, 2000 10:44 PM To: xmlorg@lists.oasis-open.org Subject: Re: The Simple Case There's a question for Una and Nagwa toward the end of this. [Terry Allen, back in his posting of last Saturday:] | Actually, I don't think anything in the current spec prevents | the simple case from being simple (if we omit the material about | submissions, which I have suggested in a separate message). There | is something that needs to be added, and on one point we have a | divergence of views. | | The thing that needs to be added is a notion corresponding not | quite to "a principal registered item" (it's pretty hard to figure | out what that would be for, e.g., the Docbook DTD; suggestions | welcome) I would have guessed driver.dtd to be the principal registered item. Am I missing something? | and not quite to "submission": it's the key by which the thing to | which related data are related is identified for purpose of | display in the list of related data. Is this what data-element-concept is for? Sounds like it from your documentation: data-element-concept Part 1, definition 3.3.15: "concept that can be represented in the form of a data element, described independently of any particular representation." It contains an optional binding of object class to property (optional, as these concepts have not yet been clarified as being relevant to data element dictionaries), a definition-text element (in ISO/IEC 11179 the description of a data element concept is primary; the names of that data element are secondary, and the concept may exist independently of any name), and any number of classification elements, permitting the data element concept to be rooted in any number of classification schemes. Part 1, definition 3.3.45: "set of objects. A set of ideas, abstractions, or things in the real world that can be identified with explicit boundaries and meaning and whose properties and behavior follow the same rules." Its value is a string (provisionally). | For the purpose of the dbrelated.txt example I made this the | entire Docbook distribution, but as noted above, that's a | simplification, as the set of related data may not be the same as | the distribution. I don't see dbrelated.txt on the public page, and I don't seem to have a working password at the moment, so I don't actually have an example of this in front of me at the moment; my apologies. | It's more like a node in a taxonomy (or other classification) of | DTDs. We can make this something abstract (that is, not any of | the registered items). It would correspond to the notion of a | literary work: the Bible is a literary work with many versions and | physical instantiations, none of which is primary in the way a | book's first edition can be. Yes. | We might instantiate this notion as the name of an item in | a classification scheme, so that "Docbook DTD" would be such | an item. Due to my unfamiliarity with the spec, I'm not quite sure what you mean by this. | Note that this classification scheme would not be | the same as the subject matter classification scheme we know | we need. That node could then be pointed to, if appropriate, | in the metadata for registered items relating to Docbook. [...] | I understand Una to want it to be possible to have registered | items that don't have registry entries or metadata, except that | which is inherited from the submission in which it arrived. This | simplification won't do in an implementation of 11179, and I think | it's not right anyway. In the 11179 model everything has | metadata: its SO, its representation, its name, and its name | context, for starters. Representation, name, and name context | cannot be inherited from another registered item, nor from the | submission. Una mentioned language during the ACXO meeting; that | too cannot be inherited. This seems to be the nut of the problem. Question for Una and Nagwa: Can we build something into the UI that makes a submission appear to the user as a structured object with a common set of metadata and then, behind the curtain, make a bunch of registered objects whose metadata is populated with information from the metadata on the original submission? As the user, I think that I can handle the idea that metadata on a "subsidiary item" in the intial submission become independently changeable metadata once all the component pieces have been checked in. Let me put it this way: once I check out an item for maintenance, I've already made the leap to thinking of it as a first-class object that has become magically changeable independent of the initial configuration of the submitted package, so I'm unlikely to be shocked when I discover that this now independent object has acquired independent metatdata. | The 11179 model allows everything registered in the registry to | be dealt with on the same basis and permits assembly of large-scale | constructs through the concept of related data. We don't want | to give that up, and as a group dedicated to Structured Information | Standards, we sure want to implement ISO/IEC 11179 correctly. It's | going to be the foundation of a lot of other work. I agree. Once in place, the same mechanism can be used for any kind of conceptual relationship. This is good. And it doesn't look on the face of it to be all that terribly hard to implement given the facilities of a good document database system. Is it? Jon [Terry Allen:] | | >From an initial reading of Terry's message, I get the impression | | that the submitter in this case has to submit the composite schema | | needed for automatic validation as a registered item in its own | | right, even if the separate modules that make it up are also | | registered items. Is this correct? | | | Not necessarily (although Bill Smith suggested it a year ago in | Granada, and we might want to do things this way - it just doesn't | scale very well). We might want to reconsider this if it has performance implications. | In the case of Docbook, a parser that started with docbook.dtd | would encounter references to the other modules and download them | as they're encountered (I'll bet nsgmls does this correctly, | although I haven't tested). As DTDs should be cached for a much | longer period than random files encountered on the Web, the first | use of Docbook should load the application's cache with all the | required modules, and subsequently all the parts of the DTD would | be present locally (which is how we use Docbook today). | | If you recall using Panorama to parse a TEI document, you'll | know that successive GETs like this can take a long time, | depending on the Web weather. Yes. In a transaction-oriented b2b environment this should work pretty well, but it might slow down b2c significantly until we get really standardized with the forms. So people may wish to fully expand the schema and point to it as a single object. It seems to me that there might be workflow advantages to tracking a single file, too. | If we were to develop an appropriate packaging mechanism, the | application might download all the modules together, unpack, | and then start work. But we don't have that now. | | | If this is not correct, I'm going to guess in advance that there | | is in theory a way to reconstruct the composite schema from a | | "specification of relationships among related data." If this is | | Yep, that's another approach, although I'm not sure what it | gains you. | | | the case, can there be more than one "specification of | | relationships among related data," and can there be a standard way | | of pointing to which "specification of relationships among related | | data" allows an application in the minimum case correctly to | | assemble the composite schema? | | Yes, you can specify as many relationships as you like. For some | reason the current DTD calls these relationships "associations". Well, but... What I mean is can you specify how to assemble the schema in the right order. Not in a standard way, it seems (though I can make a private convention that the order of related-data-references specifies the order in which modules are to be assembled). ... Sorting this out in a place where I can find it, don't mind me ... ... Actually, there are comments below for Terry. I will continue a response to his original posting on The Simple Case in a separate message. Looking at data-element.dtd, we have <!ENTITY % data-element-content "data-element-concept, data-element-association-set?, representation, name-context+, (related-data-reference* | related-data-group-reference)" > [...] <!ELEMENT related-data-reference (related-data-reference-label?, (uri-reference))> <!ATTLIST related-data-reference relationship-of-related (%related-entity-list;) #REQUIRED > <!ELEMENT related-data-reference-label (#PCDATA)> <!ELEMENT related-data-group (data-element-reference, related-data-reference+)> <!ELEMENT related-data-group-reference (uri-reference)> <!ATTLIST related-data-group-reference relationship-of-related CDATA #FIXED "related-data-group" > >From related-entity-list.ent: related-data-group | %documentation-genre-list; | %other-xsgml-list; | %style-sheet-list; | schema-home-page | distribution-home-page | registration-information | cover-letter | other documentation-genre-list is: documentation-set | documentation-set-information | reference-manual | user-guide | white-paper | faq | example | example-set | example-set-information | changelog | readme | email-discussion-list-information | tool-information literary-genre-list is: book | article | recipe | %documentation-genre-list.ent; This is backwards from what I would have expected. From the inclusion in related-entity-list.ent of %documentation-genre-list I would have expected documentation-genre-list.ent to import %literary-genre-list instead of the other way around. other-xsgml-list is sgml-open-catalogue | sgml-declaration | public-text xsgml-entity-list.ent is xml-dtd | sgml-dtd | xml-schema | xdr-schema | sox-schema | rdf-schema | sgml-element | xml-element | sgml-attribute | xml-attribute | sgml-enumerated-attribute-set | xml-enumerated-attribute-set | sgml-enumerated-attribute-value | xml-enumerated-attribute-value | sgml-parameter-entity | xml-parameter-entity | character-entity-set Shouldn't relax-schema be in there someplace? style-sheet-list.ent is: style-sheet-information | xsl-style-sheet | xsl-style-sheet-information | dsssl-style-sheet | dsssl-style-sheet-information Which I think exhausts the categories of things that can be related via a related-data-reference to a registered data item; right? Jon
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]
Powered by eList eXpress LLC