[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: Re: [dita] Namespace resolution
Erik Hennum wrote: > 2. Vocabulary processing > > Content is validated and processed against the vocabulary as a whole rather > than against the individual specialization modules integrated by the > vocabulary. I think that in fact content can be validated both against its direct governing vocabulary and against the individual modules from which the vocabulary is composed. If the vocabulary is a "DITA application" then at most the vocabulary can only add constraints to the specialization modules, which means that content that might be invalid with respect to the overall vocubulary might still be valid with respect to a particular module. For example, if a processor is a generic DITA processor it will only care about validation against the specialization packages because it has no knowledge of any other rules (knowing only about the DITA-defined rules). I don't think it's necessary to make absolute statements about validation--validation is a type of processing that serves specific business tasks and requirements. The most we can say with certainty is what can and cannot be validated using a particular technology (e.g., DTDs, schemas, Schematron, one-off validation applications, human inspection, and so on). It is important to identify the types of validation that are possible and when such validation can or should be done. > For instance, when resolving a conref, processes are obligated to check > whether the vocabulary for the referencing document includes the > specialization modules for the elements in the referenced content. > Otherwise, the conref could include invalid elements. I understand the motivation for this statement but it's not that simple, at least in the general case of doing transclusion (the current DITA definition of conref may be more constraining, possibly too constraining). In the general case, whether a given transcluded element is "valid" or not is entirely a function of the thing that is processing the result. For some applications any combinations will be valid (meaning they can be meaningfully processed). For other applications only exact element type matches are valid. And for others, transclusion of any compatible class of the referencing element is valid (e.g., any class within the same specialization hierarchy). > If every element can be processed in isolation, the specialization modules > can provide complete processing. If the processing requires contextual > sensitivity, however, the vocabulary has to be able to affect the > processing. After all, the vocabulary controls the context. I'm not sure I fully understand the use of "processing" in this case. That is, by "vocabulary" do you mean "application" (as I've been using it)? I'm trying to keep a clear distinction between the "vocubulary", which is the definition of the set of types, and the "application", which is the combination of a vocabulary and a set of business rules that define how elements in the vocabulary must or can be processed. It's a subtle distinction but I think it's important to make in XML because of the fact that XML content (and by extension, XML document constraint specifications such as DTDs and schemas) are entirely declarative and provide no *processing* specifications. Processing specifications are entirely in the domain of prose and software component implementations (e.g., style sheets, Java objects, etc.). Or more simply: vocabularies define content constraints, applications define processing for the content. There may be multiple applications associated with a single vocabulary. > For instance, in one domain, I've specialized section as backgroundSection > so my topics can include background content. In another domain, I've > specialized title as safetyInstructionTitle so I can include safety > instructions as either a topic or section. I now create a vocabulary that > integrates the two domains, so I can have background sections that provide > safety instructions. In the same way that a term within a dlentry has > processing expectations, a backgroundSection that contains a > safetyInstructionTitle could have processing expectations (perhaps of > isolation, implemented as a sidebar for some outputs). Only the vocabulary > can specify the processing expectations for the combination of the two > elements. After all, the background and safety modules might be supplied > by designers who are completely unaware of one another's specializations. To apply my terminology: I would say "Only the application can specify the processing expectations for the combination...." > Note that this processing expectation is part of the semantics of the > vocabulary. Different applications may realize those processing > expectations in different ways. And here, just to continue to be pedantic, I would use the term "processors" instead of "applications". That is, I'm trying to use the term "application" in the sense originally used in the SGML standard (the set of rules associated with a document type) not in the sense of "a set of software components that perform a task". I realize it's hair splitting to a degree but there is so much potential confusion and so much abstraction that without very precise terminology misunderstanding is a certainty. > 3. Element polymorphism > We don't want to limit processing of DITA content, however, to > DITA-sensitive applications -- especially where existing vocabularies are > being retrofitted as DITA vocabularies. For DITA-insensitive applications, > the declared element type is everything and the class attribute is nothing. Remember I'm not saying that, for example, the element types in the DITA-supplied reference schemas should be arbitrary--far from it. I'm just pointing out that in DITA-based applications there need be no general constraint on element type names. At a minimum we can say that element type names may or may not be namespace qualified. Or we can say that fully-conforming DITA processors must use DITA class attribute values to apply DITA processing semantics to elements, meaning that element type names are unconstrained. I don't think as standards writers we need to mandate the interchangability of document instances--it is sufficient to define a mechanism by which instances can be maximimally interchangable, which would be by having all element type names be the same as DITA-defined class names. > In addition, the declared element type is displayed to human readers of the > content to guide their understanding of the semantics of the content. > > Because the actual element name is important for these purposes, the DITA > architecture mandates support for generalization and respecialization > operations to change the declared element type. I'm not sure I understand this comment. Element type names are important but they are not important *to the DITA standard*. They are important to designers and implementors of DITA-based applications. Remember that the DITA-defined specialization packages define element *classes* not element types--element types only exist in DITA-based vocabularies. So while the DITA standard will define a set of classes whose names must be carefully thought out, the element type names used in a given DITA-based application are still arbitrary. >> >>5. DITA applications in which element type names are qualified with >>their corresponding package namespaces. This is possible for the same >>reason (4) is possible: element type names are arbitrary. > > > Would the root element for the DITA content have to declare both the > namespace for the vocabulary and the namespace for the element's > specialization module? Yes, assuming the specialization module is not a "magic" DITA-defined core module. > For instance, how would a specialized topic declare both the namespace for > its specialization module and the namespace for the vocabulary that's > combining it with other topic types and domains? As in the illegal: > > <specializedTopic > xmlns="http://some.org/dita/vocabulary/specializedVocabulary" > xmlns="http://some.org/dita/module/specializedTopic" > class="- topic/ph > http://some.org/dita/module/specializedTopic#specializedTopic "> The specialization modules must be associated with a prefix, so there can never be a conflict with the document's defaul namespace. Thus your example should be: > <specializedPh > xmlns="http://some.org/dita/vocabulary/specializedVocabulary" > xmlns:module1="http://some.org/dita/module/specializedTopic" > class="- topic/ph > module1/specializedPh"> Remember that for the purpose of determining whether a given namespace is "in scope " for an element you only need to examine the declarations and you don't care what the prefixes are. That is, if my application is going to examine the above element to see if the "specializedTopic" namespace is in scope I would simply examine all the namespace declaration attributes to see if any of them contain the expected URI. >>2. The namespace prefixes for the core DITA packages are "magic" and >>must be use used as-is in class attribute values in DITA 1.0. This >>avoids any requirement for DITA 1.0 processors to have to be prepared to >>dereference core package names to namespace URIs. >> >>3. The DITA 1.0 spec can *discuss* the other ways in which namespaces >>_can_ be used in conforming DITA applications without actually doing it >>requiuring it or doing it in the oasis-provided DTDs and schemas. > > > It's an inspired compromise for 1.0 to treat specialization module > qualifiers as magically bound to namespaces that aren't actually declared > on the element. I'd like to see it applied to both core DITA and non-core > specialization modules so we don't have a two-tier typing scheme. I assume by "two-tier typing schema" you mean a typing scheme applied to some modules but not others? I agree, consistency seems to be paramount here. I think there is very little risk in having module prefixes bound to namespaces since it doesn't affect existing processors in any way and it doesn't affect element type naming or schema construction, apart from requiring the namespace declarations, which is standard XML syntax and doesn't change the processing any tool would do (that is, namespace-aware processors will handle the declarations as they would anyway and namespace-unaware processors will continue to ignore them). I don't see how making module prefixes globably unique names can be controversial. >>>In principle, I agree strongly. In practice, my concern is that, to >>>implement this approach, we have to solve problems like swapping > > namespaces > >>>in and out of the class attribute during generalization and >>>respecialization. >> >>I'm not sure I understand this comment: the value of the class attribute >>is (conceptually) just a list of namespace prefixes that map to the URIs >>for packages. The class attribute value need never change. > > > Sorry, I was obscure. The class attribute doesn't change, but the > namespace on the element would have to change during generalization and > respecialization. > > For instance, here's the element before generalization > > <specializedPh > xmlns="http://some.org/dita/module/specializedDomain" > class="- topic/ph > http://some.org/dita/module/specializedDomain#specializedPh "> > > and after generalization > > <ph > class="- topic/ph > http://some.org/dita/module/specializedDomain#specializedPh "> > > If the namespace isn't changed, the element will be in either no namespace > or the wrong namespace and thus won't be valid. By generalization I assume you mean "generating a new instance whose element types are superclasses of the original input element." I think there is confusion about where the namespace is applied, as discussed above. The namespace for the specialization package is *never* the namespace of the element type. Therefore, during generation of a generalized instance you would be rewriting the class attribute and, presumably, changing the element type name. It would not be necessary to remove declarations of names spaces not used in the generalized instance but you could if you wanted to. That is, there's no problem with declaring namespaces that are never used. So I think your example could be: <specializedPh xmlns:module1="http://some.org/dita/module/specializedDomain" class="- topic/ph module1/specializedPh"> and after generalization <ph xmlns:module1="http://some.org/dita/module/specializedDomain" class="- topic/ph"> Note that the class= attribute has been rewritten but the namespace declaration has not been removed. But it could be: <ph class="- topic/ph"> Both "ph" instances are equivalent and would be processed the same way. I should note also that this notion of the generation of literal generalized instances is not something that the DITA specification needs to define--this type of processing is simpl one of many types of processing that might be applied to DITA documents and the ability to do it is inherent in the nature of the specialization mechanism. >> >>As long as this is always the case then the element type name is simply >>irrelevant for the purpose of DITA-based processing. That is, from a >>DITA perspective, the element type name is, by definition, a synonym for >>the element's class name. > > > Agreed, for DITA-sensitive applications, the element name is irrelevant. > > DITA content also should be processable, however, by DITA-insensitive > applications. For those applications and as well as for human consumption, > the DITA architecture needs to support changing the element name -- > effectively, casting to a different declared type. I'm still not understanding this comment: the DITA specification can only define processing in terms of class values. The element type name value is simply outside the scope of the DITA architectural mechanism. We can state that there is a class of simple DITA processors that expect element type names to be the same as DITA-defined class names and that in order to satisfy such processors one is encouraged to make element type names the same as leaf class names but I see no reason to require that so I see no reason for the architecture mechanism to say anything about element type names at all. > >>... > * By the namespace on the root element if the namespace matches that of a > known DITA vocabulary A document need not be rooted at a DITA element. A document is a DITA documente if *any element* is derived from a DITA-defined type. A document is a "DITA-only document" if its root is a DITA-defined type and all elements are likewise derived from DITA-defined types. > Regardless of whether the class attribute is namespaced, wouldn't these > tests have to be performed anyway and in the same way? > > That is, couldn't a content management system such as XIRUSS-T use the > following approach? > > 1. Is a namespace declared on the root element? If so, match the known > namespaced vocabularies including the known DITA vocabularies. Yes, but not limited to the root element--declared anywhere within the document. > 2. Is a DTD declared for the document? If so, match the known > vocabularies with public identifiers including the DITA vocabularies. XIRUSS-T, like MS Word, doesn't do anything with DTDs. So no, this wouldn't work. In any case, this is not reliable because the external DTD subset is not 100% reliable way to determine the true document type of a document. > 3. Is a Schema declared for the document? If so, match the known > vocabulary declarations including the DITA vocabulary declarations. If the schema is bound via the nonNamespaceSchemalocation then it is no better than an external DTD subset. If the namespace is declared by schemaLocation then there is also a namespace declared and I don't need to look at the schema. > 4. Prompt the user for known vocabularies including the DITA vocabularies. This is reliable to the degree the user can answer the question accurately but is not general in the sense that it does not support the use case of generic processors acting on documents without further input. > If a namespace on the class attribute doesn't reduce the number of tests > needed to match content with a handler, would it make sense to defer > namespacing the class attribute until the full namespace solution is > specified? That way, we keep our options open in case something else in > the solution makes it unnecessary to namespace the class attribute? I don't think so. Part of the point of namespacing the class attribute is to ensure that the DITA class attribute can always be distinguished from other class attributes. For example, I have existing document types that have a class attribute--if I wanted to retrofit those to use DITA as their underlying architecture I would have to change one attribute name or the other. Therefore, requiring the class attribute to be qualified ensures that at minimal cost. Remember too that attributes, by definition, are not in any namespace unless they are qualified. That is, putting an element in a DITA-defined namespace *does not* put the attributes of that element in a DITA-defined namespace. Many specifications ignore this but nevertheless it is the case. Thus if the class attribute is not qualified it cannot be reliably recognized as being the DITA class attribute. Qualifying the class attribute, and only the class attribute, also ensures that documents are bound to a DITA-defined namespace without constraining or further complicating any other processing or requiring the declaration of namespaces used only within class attribute values (which will usually be limited to "magic" DITA-defined prefixes). That is, I don't see any great risk to qualifying the class attribute and much benefit from doing it. Doing this would not in any way affect how we might use namespaces in the future for either class attribute values or element type names. > For instance, if in 2.0, the namespace for the base DITA topic module ends > up declared in the class attribute value, would declaring the namespace on > the class attribute itself become redundant? > > <ph class="- http://dita.oasis-open.org/modules/topic#ph "> I never intended that module namespaces be declared in the class attribute--there are number of syntactic reasons why this would be a bad idea and in any case it's not necessary. >>This again means >>that element type names or details such as whether or not applications >>use namespace qualification need not be a direct concern to the DITA >>specification itself. > > > If (as suggested above) vocabularies are a core construct for the DITA > architecture, the namespaces used to identify vocabularies are a concern of > the DITA architecture. > > Also, in the future, there's a strong argument for DITA to incorporate > namespaces into the typing system to identify specialization modules so we > can have unambiguous element types. > > Those reasons suggest that the DITA specification shouldn't leave > namespaces entirely to the discretion of the application. I'm only talking about the namespace qualification of element type names, not the namespaces used for modules or DITA-defined attributes. A namespace-qualified element type name is no different from an unqualified one as far as the DITA specification is concerned: it's an arbitrary name. That means it's up to a given DITA-using vocabulary how to define what and how namespaces are used for the element type names in that vocabulary. >>But the DITA standard is *not* primarily an authoring support system. It >>is a generic standard that defines core types and processing semantics >>that in turn provides a solid basis from which task-specific authoring >>support systems can be built. That's a key difference and requires a >>sometimes subtle shift in emphasis of requirements and features. > > > Maybe yes and no? > > 1. As an architecture, DITA is a typing system for specialization of > elements, integration of design modules, and so on. > > 2. As a specific type hierarchy, DITA seeds the architecture with a base > specialization module, derives core specialization modules, and assembles > core vocabularies for the problem space of human-readable content. > > The core declaration modules and DTDs are an attempt to conform to the DITA > architecture within the limits of DTD syntax. For instance, the class and > domains attributes exist exclusively to support processing. Similarly, the > entity design patterns exist exclusively to support integration of modules > as vocabularies. > > As a specific type hierarchy, DITA has to be more concerned with > authorability and readability than, say, SOAP because DITA content in the > core problem space is, fundamentally, a communication from author to > reader. > > Are concerns with readability and authorability restricted to the > declaration level? Couldn't those concerns be legitimate issues for > abstract types? They are concerns for the abstract types but they are not *primary* concerns. That is, the abstract type design should prefer precision and consistency within the architecture to authorability. By the same token, concrete document types can prefer authorability over precision by taking advantage of the specialization mechanism to map from the abstraction to the concrete. So I'm not saying that the DITA-defined abstract types should ignore authoring concerns but they should not be driven by them. That is one of the big advantages of an architecture mechanism--it provides for clear separation of concerns and avoids having implementation details impinge on the core design while providing the freedom for implementors to do what they need to do to meet pragmatic needs. Cheers, E. -- W. Eliot Kimber Professional Services Innodata Isogen 9390 Research Blvd, #410 Austin, TX 78759 (512) 372-8122 eliot@innodata-isogen.com www.innodata-isogen.com
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]