[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: OASIS Staff comment on ASIS
OASIS Staff comment on ASIS Artifact Standard Identification Scheme for Metadata 1.0 Approved TAB Document 30 January 2006 ArtifactStandardIdentificationSchemeForMetadata-1.0.1-req-approved Staff appreciates the time that the TAB has put into this document, but as we've previously shared with the TAB, we reached the same conclusion in December that many other commenters have expressed: this document is not complete or finished enough to be enacted as policy. If any mandatory rules are enacted, they should be in the form of much shorter and clearer guidance. With gratitude for the substantial amount of prior work on this artifact and file naming issue, we offer the following comments as input to any revision and redesign process. ========================================================================== [57] and passim in page footers ASIS: "Copyright (c) OASIS Open 2005. All Rights Reserved." If this document is published in 2006, the copyright dates should be current ("2006") in line 57 and in the footers [129-130] TC-defined names approved by the OASIS TC Administrator ASIS: "TC-defined unambiguous and descriptive names are also permitted, if approved by the OASIS TC Administrator" We question whether pre-approval is logistically infeasible. According to this ASIS draft, TCs may use either the structured (componentized) names or the freeform (TC-defined) name pattern (alphanumeric + hyphen). Faced with this choice, we think TCs will often elect to use the TC-defined naming scheme -- because it offers what they will desire as a filename. However, we want TCs to be able: (a) to freely assign filenames [said to be to derived from artifact names or artifact identifiers] and (b) to upload those files to the Web server without any OASIS TC Administrator approval process. Insinuating the TC Administrator into a "name approval" process will not scale. In exceptional cases, TC Admin may have to delete (and replace with a disambiguation page) a file that is self-loaded to a highly inappropriate or misleading URI, but in most cases we expect TC editors to be able to self serve. A staff prior-approval loop is unattractive, as it is too likely to be non-scalable and a source of delay. [230] Definition and function of Artifact Identifier string ASIS: "Artifact Identifier: A string used to uniquely identify a particular artifact. [230] "TC-defined unambiguous and descriptive names" [129] "unambiguous names [141] What's the difference between "uniquely identify" and "identify a [unique] particular artifact" ? We think we understand the ASIS goal of providing a string to "uniquely identify" a particular artifact, but it does seem that the Artifact Identifier string in all cases achieves this. Whereas URIs identify distinct resources, the Artifact Identifier, sometimes used at a higher level of abstraction than files which physically instantiate artifacts at the machine level, introduces a fuzzy notion about the relationship between the identifier string and the predictable representation. Example from the given pattern: product-productVersion-artifactType-stage-revision-language.form Derived filenames (see line 480: "The filename MUST be the ArtifactIdentifer followed by the optional literal period and form") are, among others: saml-v7.0-spec-wd-02-de.pdf saml-v7.0-spec-wd-02-de.odt saml-v7.0-spec-wd-02-de.html saml-v7.0-spec-wd-02-de.zip /* includes schemas and wsdls */ saml-v7.0-spec-wd-02-de.tar.gz /* includes UML diagrams also */ ArtifactIdentifer: saml-v7.0-spec-wd-02-de Sample use case: someone reports a typo "salm" for "saml" in "saml-v7.0-spec-wd-02-de." Where do we look? It turns out that the typo, introduced manually, is only in the ".odt" artifact. That artifact of interest arguably is not uniquely identified by the string "saml-v7.0-spec-wd-02-de". The Artifact Identifier string "saml-v7.0-spec-wd-02-de" thus seems to fall short of providing unique identification for an artifact. We suggest that the definitions should be revised and/or that further justification be given to the notion of Artifact Identifiers as distinct from filenames -- which are used directly to compose URIs in the general case. [275-276] Date format [378-379] format YYYYMMDD ASIS: "Date: The date of the artifact, in the format YYYYMMDD." The ASIS document itself displays "30 January 2006" as the publication date. We are puzzled as to the motivation for the YYYYMMDD date format, since the 'Date' metadata element does not occur in the Artifact Identifier (OASISDefinedName format) product-productVersion-artifactType-stage-revision[-language].form -- nor in the URN document-id, nor in the schema name, nor in the OASIS Standard. So: Date is said to constitute required metadata, but the document provides no indication of a context within which that datum would be encoded. Line 378 "Each artifact MUST have an associated string value for the Date of the artifact." does not indicate where or by what means 'YYYYMMDD" is to be "associated" with an artifact. Unless convincing use cases can be cited to justify this (uncommon) format, we think TCs should be able to use date formats of choice, or one of the standard formats, as context demands, per ISO. [283-284] PDF and HTML forms ASIS: "... when submitting a Public Review package, the specification(s) must be provided in both Adobe Acrobat (pdf) and HTML forms as required by [OASIS TCP]." While the central goal of ASIS apparently is not to levy new requirements against the current TC Process document (but rather, to comply), we take this occasion to express agreement with other reviewers who have stated a desire to make XHTML the (sole) normative format for OASIS specifications. Templates are currently provided for XHTML "transitional." We feel that some of the most important goals for automation, spec QA, and searching will not be attainable (feasibly) unless TC specifications are published in XHTML, spec-XML, or equivalent format. [351] OASIS Document Templates ASIS: "The OASIS Document Templates for text specifications SHALL be updated to include the metadata..." We question whether this directive belongs in the document: OASIS has templates for some of the proposed artifact types, but not for others. However, staff will bring and keep all templates into alignment with all policies and guidelines OASIS issues, including any part of this document that may become policy. [399] Artifact Identifiers and unique spellings ASIS: "TCs SHALL NOT create two or more Artifact Identifiers that differ only with respect to case." We suspect that this rule needs to be re-written to provide scope: e.g., "... within a given directory." The origin of this rule was apparently (?) a concern that while the OASIS servers all handle mixed case faithfully [Unicode], rare situations might arise in which data could be transferred to some system that used non-case-respecting software, possibly resulting in overwritten files or user confusion. Virtually all modern filesystems store information in case-sensitive [Unicode] representations, but not all applications adhere. More generally, however, this rule raises the question as to whether ASIS is attempting to ensure that no TC can create two or more (character-wise) different artifacts having the same filename -- e.g., 'CATALOG' files for successive stages of a specification, each bearing the filename 'catalog' and living in a version-labeled directory. While the URIs would obviously be different, the filenames would be identical. We think identical filenames at different URIs is acceptable as well as expected. ASIS seems to want all Artifact Identifiers to serve as unique identifiers, and to require derivation of filenames from Artifact Identifiers. In practice, we cannot believe that TCs will want to change the spelling of every filename with every new release. Please see comment ad line 480 [409] and [746] underscore ASIS: "... underscore (Low Line)..." We understand that the TAB's draft documents moved back and forth on the use of underscore as an allowable or inadvisable name character. Respecting the legitimate differences of opinion (taste) and perception of the tradeoffs, we do not foresee that the adoption of a restricted character inventory for names without underscore will greatly change the equation: users are now accustomed to including a range of characters in filenames that are commonly deprecated in various application contexts: space, comma, ampersand, parenthesis, tilde, pound-sign, dollar-sign, square-bracket, plus-sign, etc. These characters, and all control characters, are disallowed in the ASIS draft as filename characters because they are known to create cascading problems in data fidelity, at least under some common conditions. Similarly, because underscore (Low Line) is an ambiguous character, indistinguishable from other non-displayed BLANK characters in certain visual contexts, we do not think it should be allowed in URIs for artifacts (hence, not in filenames). [424-425] OASISdefinedName an option, not a requirement ASIS: "TCs MAY use a TCdefinedNames (which need not follow the rules for OASISdefinedNames) subject to approval by the TC Administrator." We are concerned that the current ASIS draft as written does not clearly reveal to the reader that use of the OASISdefinedName is an option, not a requirement. As a simple example, line 466-67 should say "... if the TC elects to use the OASISdefinedName, it MUST contain..." More generally, the document needs to be much clearer about the degree of "requirement": The captions at lines 35-38 and 115-118 suggest that the entire document is suggested as a recommendation; however, the language and tone throughout the draft is that of mandate, not recommendation. [427-428] Constructing Specific Artifact Identifiers ASIS: "The following format SHALL be used for OASISdefinedNames. This format includes selected metadata in a consistent format; variations for specific purposes are described..." We appreciate the tremendous amount of work that went into identifying the requisite metadata to be captured for each TC artifact [338-339]. It is unclear, however, what additional benefits are to be gained from creating and citing the concatenated string in addition to each separate component which is said to be an "associated" datum; we recommend consideration of dropping this requirement. [421] Alternation between OASISdefinedName and TCdefinedName ASIS: "An ArtifactIdentifier MUST be either an OASISdefinedName or a TCdefinedName." The draft ASIS document [e.g., line 232] identifies a goal of using structured names (concatenated sub strings) in order to provide a basis for parsing such artifact identifiers. We understand that goal, but feel that the value of such parsing is compromised by allowing TCs to *sometimes* use componentized forms [OASISdefinedName] and sometimes, to use TCdefinedName instead. For some artifact types it will be difficult for a machine to determine whether a given ArtifactIdentifier is by intent a TCdefinedName or a possibly malformed OASISdefinedName. Consideration should be given to design of a unified approach; if this proves intractable or undesirable, mechanisms should be specified to permit the encoding of hints about the type of ArtifactIdentifier being used, as an aid to parsing and other machine processes. [334-335] tcShortName hyphen removal ASIS: "TC Short Name: The short name assigned by the TC Administrator to the Technical Committee, with any hyphens eliminated." We do not think the elimination of hyphen in tcShortName is motivated or required in the current design, with the possible exception of its use in connection with URNs, per RFC 3121. In other use cases, and especially in connection with URIs rooted at http://docs.oasis-open.org/, we think the tcShortName should include the hyphen. Since [line 433] "The tcShortName is not included" in the format for OASISdefinedName ("as it can be determined uniquely from the product"), the possible benefits for ease of parsing are small compared to the difficulties caused by forking the canonical spelling of assigned TC short names. [348-349] Additional metadata ASIS: "The Technical Committee MAY define additional metadata for its artifacts, provided those metadata names and values are approved by the TC Administrator." The context for (formal) usage of the "additional metadata" needs to be clarified such that we know what usages are prohibited, or possibly prohibited, if they fail to meet the approval of the TC Administrator. Surely the document cannot prohibit the definition of new artifact metadata by TCs (per se). [318] descriptive name ASIS: "TC-defined unambiguous and descriptive names" [129] "descriptive name defined by the TC for the artifact" [318] "The descriptive name of the specification" [321] The discussion about "TC defined Name" in 317-333 is not clear. Line 129 says "TC-defined unambiguous and descriptive names are also permitted," suggesting that TC-defined names and descriptive names are two different things. However, lines 317-318 seem to imply that a 'TC defined Name' *is* 'A descriptive name defined by the TC for the artifact.' In the example: * what is a "container", exactly? A directory? * WSRP 1.0 and SAML 2.0 -- are these "descriptive names" ? * what about 'saml-2.0-AuthnContext-schema-os' - is that part of any URI? [437-440] Omission of "stage" component ASIS: "A value for Stage and the following hyphen separator MUST be included except in the following cases: - when ArtifactType is schema (or) when ArtifactType is wsdl", in which case a value for Stage MAY be omitted." We do not understand the justification for special treatment of "schema" and "wsdl" artifact types; other types (e.g., catalog) might be even better candidates, were the goal to alleviate the burden of encoding a "stage" component. If the design for structured names is retained and mandated, exceptions like this should be resisted. [446] Use of 'form' component ASIS: "A value for Form SHALL be included for files and final URI components that resolve to a specific artifact, and SHOULD NOT otherwise be present." We do not see the benefit or necessity of these rules: there are well-established use cases for "final URI components" which end in "slash" or other character strings not matching literal "." + "form". Certainly RDDL documents and other namespace documents are one class of exception, but we envision others as well. [480] Derivation of filenames from ArtifactIdentifers ASIS defines a close relationship between an artifact identifier (string) and the filename associated with the artifact: the "filename MUST be the ArtifactIdentifer followed by the optional literal period and form". We do not think this is necessary or necessarily desirable. We prefer a scheme in which the URI path portion 'above' the filename reflects key metadata elements -- which allows TCs greater liberty in assigning filenames. Thus, TCs should be free to use structured (componentized) names as filenames (based upon OASISDefinedName or TCDefinedName), but they should not be required to do so: filenames should not be required to "be" the ArtifactIdentifer followed by... 480 The filename MUST be the ArtifactIdentifer followed by the optional literal period and form 520 The filename MUST be the ArtifactIdentifer followed by the optional literal period and form 529 The filename MUST be the ArtifactIdentifer followed by the optional literal period and form 537 The full ArtifactIdentifer followed by the optional literal period and form MUST be the filename. [482] Document titles ASIS: "The filename MUST bear a reasonable and descriptive relationship to the document title." Section 6.3 "Other Artifact Filenames" seems to concern artifact types other than prose specifications and other prose documents. For example (we assume) catalog, schema, wsdl. But such documents frequently do not have "titles" as such. We do not think users will be able to apply the rule in line 482 in such cases. Example: does the filename 'b-2.xsd" bear a "reasonable and descriptive relationship to the document title"? See: http://docs.oasis-open.org/wsn/b-2.xsd [491-493] Default Web Pages ASIS: "6.4.1 Default Web Pages for Product URIs: The relevant required metadata for an artifact MUST be maintained at the default index page for the http scheme URI for each product and productVersion to facilitate search and retrieval. For each such index page, an XHTML-compliant meta element MUST be included..." The prescriptions in 6.4-6.5 should be simplified to indicate that metadata must be associated with each artifact in a manner appropriate to the artifact type, in accordance with the OASIS-provided template(s) for each type, located at http://docs.oasis-open.org/templates/ . Any revised (ASIS) specification should include the link for the OASIS template page in this and any other place it's mentioned. [570-571] Schema sub-types ASIS: "It is RECOMMENDED that only the following sub-types be used and only when the type is schema: dtd, rng, and xsd." We do not think a blanket recommendation should be given deprecating sub-types other than the three named. We fully expect that new "types" will become common as schema languages mature (e.g., DSDL languages). Further, this passage raises the broader question of the usefulness of "schema", since in its typical illustrated use cases, no distinctions are made between schema types, or indication as to whether a DTD is based upon SGML or XML, etc. Since the file extension itself may be of negligible value in conveying information about a schema type [.xsd, .rng, .dtd, .rnc, .<other> files in a .ZIP archive], it seems that this design needs further work. This touches upon the matter of ArtifactIdentifiers as useful for unique identification of an artifact. A filename matching an ArtifactIdentifier for a schema might be, in structured format: product-productVersion-stage-revision.form examples: xacml-v3.0-wd-03.xsd xacml-v3.0-wd-03.dtd xacml-v3.0-wd-03.rng xacml-v3.0-wd-03.rnc When the "." + "form" is dropped, to meet the ArtifactIdentifier format pattern, we are left with one identifier (xacml-v3.0-wd-03) that matches four different artifacts instantiated (quite differently) in four different files. Hmmm... [594-596] Namespace URIs ASIS: "OASIS namespace declarations pursuant to [XML NS 1.1] or [XML NS 1.0] MAY be defined as URIs using the http scheme as an alternative to the URN form defined in Section 6.1. That sshould be "7.1". In view of the complexities involved in the use of URNs (no common resolution mechanisms), we think namespaces should be defined as URIs and generally should be DNS/HTTP resolvable. Why not? What value is a 404? In any case, when an HTTP scheme URI namespace has been declared by a TC, it should be reserved for use [by TC Admin] as a location for a namespace document or the equivalent; no other kind of resource should be accessible by dereferenceing that URI. Dereferencing the URI should fetch a RDDL document or similar descriptive resource informing the reader about the relevant resources. [631-633] Base Domain For URIs ASIS: "URIs created for all OASIS artifacts created by or pertaining to technical committees SHOULD be rooted at the docs (third-level) domain on the oasis-open.org Internet domain, thus at the base docs.oasis-open.org." Change to: URIs created for all OASIS artifacts created by or pertaining to technical committees SHOULD be rooted at http://docs.oasis-open.org No need to refer to "docs" as the third-level domain component. [634-637] Technical Committee Tree and related trees ASIS: "Technical Committee Tree: The short name of the OASIS technical committee, as established by the TC Administrator, typically upon initial formation, MUST be the next node in the URI after the base: http://docs.oasis-open.org/[tcShortName] We agree with this scheme as the root for all OASIS specifications and other approved TC work. Optionally, TCs will be allowed to deposit not-yet-approved or otherwise "not-subject-to-approval process" documents under the TC's root in various designated subdirectories. For specification-related documents, we propose that the path should be as follows: docs.oasis-open.org/[tc-shortname]/[productName]/[productVersion] /[document-stage]/{document-version}/[document-filename] This scheme implies that revisions would need to be made in ASIS at lines 646 ( docs.oasis-open.org/[product] ), 653 ( docs.oasis-open.org/[product]/[profileID] ), and 669 ( docs.oasis-open.org/[product]/[productVersion] ) Most critically, insert [tcShortName] in all cases after docs.oasis-open.org/ in lines 646, 653, 669. [638-639 and passim] Using the .php file extension ASIS: "An index page MUST be maintained at the default location (typically docs.oasis-open.org/[tcShortName]/index.php)" We are not sure what "typically" means (the path? the filename? both?), but our preference is NOT to use an index filename with ".php" for what will evidently be an (X)HTML document; we prefer to keep URIs free of strings that reflect transient technologies [656-663] 8.3.3 Non Specification Track Documents ASIS: "docs.oasis-open.org/[tcShortName]/other" For non-specification-related documents, other directories may be defined as appropriate and are not within the scope of the ASIS document. [664] TC Admin ASIS: "Each Product is assigned an identifying name by the TC Process Administrator" Please correct to "TC Administrator" [675-685] Section 8.5 Latest Version Subtree Providing support for the notion of generic URIs as URLs for getting the "latest/current" version is one of our highest priorities: it has been requested by numerous TCs for different application scenarios, and is recognized as a common industry practice for standards development. While we can commit to providing this support, we are not aware of implementation experience sufficient to demonstrate the integrity and usability of the exact design proposed in section 8.5. User requests have shown that there are multiple kinds of "latest" (latest editors' draft, latest approved version, latest QC'd version, etc). We think it is unwise to commit to this novel idea of a "latest" URI at '/[product]/latest', and recommend omitting this section from any revised ASIS draft, pending a final decision based upon input from TC Chairs and others. ================== Colophon: the document above [presumably] was not checked thoroughly for cogency, internal consistency [pasted as a merge from numeous sources] or reasonableness. Please discount and ignore any such classes of editing errors, as solely attributable to 'rcc', with regrets. ==================================================================
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]