dita message

Subject: Re: [dita] Stage one proposal: drop domains (the attribute, not the concept, and only mostly)

From: Eliot Kimber <ekimber@contrext.com>
To: DITA TC <dita@lists.oasis-open.org>
Date: Tue, 07 May 2019 16:00:43 -0500

I agree with Robert's analysis that the @domains attribute is not nearly as useful in practice as we once thought and therefore support a proposal to at least relax the requirements for it.

I understand the argument for just keeping @domains and removing the requirement to list all domain modules in it.

However, I would prefer to split the attribute into two: one that is necessary for attribute domains and one that is only useful to systems that actually use the @domains value to do something interesting (which at the moment would be processors that enforce strong conref constraints and processors that have knowledge of DITA modules and need to make decisions based on that, for example a CMS system that can recognize any DITA document *and know what to do with it* based on the declared domains).

So for the attribute use case I'd prefer a new attribute like @attspecializations or something, that makes its use clear and limits its use to just that task. This attribute would be required in order to use attribute specializations.

The other use case is what I call the "DITA document type" use case. This is a use case that is very interesting intellectually and has some important *potential* implications for how DITA documents *can* be processed but is not actually that interesting in practice (at least not today) for the reasons Robert outlines: no DITA processor today will fail because @domains is not accurate (except for the attribute specialization case, of course). And as far as I know, no processor provides functionality specifically because @domains values are completely specified.

But I think the DITA document type case is important enough that we should still provide the *option* of doing it in DITA documents even if it's not required.

Thus, we shouldn't get rid of @domains entirely. We should either retain @domains (with relaxed requirements) or define a new attribute that replaces @domains for the specific purpose of defining a document's DITA document type.

I think the @domains feature in DITA, coupled with the rules for how DITA modules are managed, is an important distinguishing feature of DITA, even if it's a feature that few, if any DITA users take advantage of today. [And the reason it's not that important today is that the original vision of a world in which DITA maps and topics were freely and widely interchanged has simply not come to pass--maybe it never will, hard to say. But the feature is still essential in an environment in which the loosely-coupled interchange of DITA content is a requirement.]

Here is my explanation of DITA document types and why I think they're important architecturally, if not in practice (today):

DITA is, as far as I know, unique among XML standards in that it provides a solution for the problem of knowing what the "true document type" of a given document is. It is the @domains attribute that does it.

In SGML and XML generally there is *no defined way* to know that the "true" document type of a given document is. In SGML you have to have a DOCTYPE declaration, which at least provides the definition of the set of element types and attributes the document *could* use, but it still isn't a reliable indicator of the documents *true* document type. For example, you could have a DOCTYPE declaration with a specific public ID but then locally use whatever declarations you want (this is the DocBook and JATS case, where people would make all kinds of changes to the base DocBook or JATS DTDs without changing the public ID on the DOCTYPE declaration).

XML adds a couple of wrinkles:

- Grammarless documents: XML doesn't require any grammar at all, in which case a document's true type is either unknowable in the general case or it can only be weakly inferred through the use of specific namespaces (but remember that namespaces say nothing about the element and attribute names within the namespace) or element type names or maybe through MIME types provided by systems serving the resources. Only in the case of namespaces associated with standards or other prose document type definitions can you reasonably infer what the intended document type of the document is.

- Namespaces: This gives you the *potential* for associating a namespace name to some formal specification of what rules are for elements and attributes in that namespace but this is only *potential*--there's no formal requirement in XML that any given namespace be anything more than a way to provide globally-unique names to elements and attributes. The direction of pointing is from specifications to namespaces, not from namespaces to specifications (because there's separate formal, standardized definition of what a namespace *is*).

So essentially, before DITA, there was no standard-defined mechanism anywhere in XML by which a processor could reliably determine the true document type of an arbitrary document.

DITA solves this problem by doing two things:

1. Organizing all element types and attributes into invariant, globally-named modules.
2. Allowing documents to list, on topic and map elements, the set of documents that document may use.

DITA modules are globally-named and invariant. Meaning that for a given version in time of a given module, *all copies of it* are intended to be identical. That is, DITA says "you never change any module directly, only indirectly through configuration or constraint".

This means that once you know about the rules for a given module version-in-time, all you need is the *module name*, you don't need the document to point to literal grammar files (because either the module's rules are already embedded in your processor or because you can just fetch your own copy of the module's grammar, if you need it). In any case, the grammar rules are just part of the total set of rules for any module, so even if you have grammar files your processor still needs to implement other rules (for example, ensuring that a given section has at most one title even though that can't be controlled by a DTD).

This means that DITA documents can be completely self-describing without regard to any grammar documents they might or might not refer to.

All maps and topics must have the @dita:DITAArchVersion attribute. This binds the document to an OASIS-defined namespace that is formally associated with the DITA specification. It also specifies the version in time of the DITA specification the document asserts conformance to.

This serves as a primary and reliable signal that the document (or more precisely, the topic or map element) intends to be a DITA document and can therefore legitimately participate in DITA processing and interchange.

If the map or topic element also has a @class attribute with a value that matches the pattern for DITA @class values then it almost certainly intends to be a DITA document and there's enough information to know what kind of DITA document it is (map or topic).

However, at this point we still don't know the complete DITA document type, namely the set structural types, domains, and constraints that apply to the document.

Providing that information is what the @domains attribute does: by listing *all* the structural, domain, and constraint modules @domains *fully specifies* the DITA document type of the map or topic element and therefore enables, in particular, validation against it, but also any other processing that might be specific to a particular module (even if elements or attributes from that module are not present in the document itself).

For processing where all you're doing is consuming the elements and attributes in the document and processing them to some output, the document type probably doesn't matter--you just take what you get and apply normal @class-based processing.

But for processing where the use or non-use of a specific module *is* important then having the @domains value be complete is essential. One such processs is checking strong conref constraints. Another would be DITA-aware CMS functionality where you expect to get random documents and need to do the appropriate, module-specific thing with them on ingestion. That is, it is possible to have a self-configuring DITA-aware CMS that can take *any* DITA document that fully specifies its DITA document and immediately ingest and manage that document as appropriate.

At the moment, neither of these last two cases seem to be very common, certainly not with the current set of common DITA processing tools and management systems.

But that doesn't meant they won't *ever* be important.

So I think that at a minimum we need to keep @domains as an option if not define a new, replacement attribute, that is purely for the point of defining the full DITA document type, which if present, must list *all* modules that make up the map or topic element's DITA document type.

Cheers,

--
Eliot Kimber
http://contrext.com

ïOn 5/7/19, 2:13 PM, "Robert D Anderson" <dita@lists.oasis-open.org on behalf of robander@us.ibm.com> wrote:

That's right, I'm finally going there with a stage one proposal.

We should drop the requirement for domain tokens in the @domains attribute, apart from attribute domains (which must still be specified in order to support filtering / generalization based on those attributes).

We could rename @domains at that point to something like @specializedatts but I doubt it would be worth it (and preserving the attribute name would let people continue to declare the tokens, if they wish).

Some of you may remember I wrote up a little -- er, rather long -- rant about the tokens a year or two ago; if you didn't read it then and need help getting to sleep, you can see read about what prompted this proposal here:
http://metadita.org/toolkit/nonononodomains.html

Robert D. Anderson
DITA-OT <http://dita-ot.org/> lead and Co-editor DITA 1.3 specification
Marketing Services Center________________________________________
E-mail: robander@us.ibm.com

11501 BURNET RD,, TX, 78758-3400, AUSTIN, USA

Follow-Ups:
- Re: [dita] Stage one proposal: drop domains (the attribute, not the concept, and only mostly)
  - From: Chris Nitchie <chris.nitchie@oberontech.com>

References:
- Stage one proposal: drop domains (the attribute, not the concept, and only mostly)
  - From: "Robert D Anderson" <robander@us.ibm.com>