dita message

Subject: Re: [dita] Conref Vs. Transclusion and Using Non-DITA Data As DITA

From: Chris Nitchie <chris.nitchie@oberontech.com>
To: Eliot Kimber <ekimber@contrext.com>, "dita@lists.oasis-open.org" <dita@lists.oasis-open.org>
Date: Wed, 7 Mar 2018 16:32:44 +0000

Thanks for the thoughtful feedback, Eliot. I spent the better part of yesterday afternoon mulling this stuff over, and it very much helped me clarify my thoughts.

What I'm concerned with is the transclusion of content that is structured, but not DITA, into a DITA context. It may be loosely structured - markdown, HTML, Word (shudder) - or rigidly structured but non-XML - CSV, JSON, YAML - or some foreign XML grammar that can coerced into a DITA format via transform - WSDL, TLD, Docbook - but there's structure to be found.

I have an agenda here. I firmly believe that if DITA can't find a way to marry documentation authored in other formats with its native structured content, it's going to struggle to stay relevant. Lightweight DITA, with support for format="mdita" and format="hdita", is a giant leap forward on this front. The extensible @parse attribute is an attempt to create one avenue for it in full DITA.

At one point in the below, Eliot suggests that I'm concerned with his case #4 (<image>/<object>), but I think media references are fundamentally different from structured content transclusion references.

- Media references are designed such that their ultimate presentation is managed by a downstream rendering application - generally a web browser or a PDF engine - whereas I'm envisioning resolution as part of DITA preprocessing, such that the original inclusion directive is replaced by the referenced content before any stylesheet is involved.
- Media references are designed primarily (though I suppose not exclusively) for the inclusion of resources that are graphical in nature, or presented with some sort of interactive graphical interface; hence the @width, @height, @scale, etc. attributes. I'm concerned with content that, after resolution, is textual in nature, even if the text is tagged one way or another.

As I understand Eliot's argument, he's asserting that structured non-DITA *should* be treated as case 1 (DITA in a DITA context) with some sort of external transformation. In contrast, I think structured non-DITA content in a DITA context is a separate use-case (Case 5, if you will).

1. DITA in a DITA context (@conref/@conkeyref)
2. Text in a preformatted DITA context (<include parse="text">)
3. Non-DITA XML in a foreign context (<include parse="xml">)
4. Media references (<image>/<object>)
5. Structured non-DITA content in a DITA context (<include parse="¯\_(ツ)_/¯">)

(All of these, with the possible exception of #1, could potentially benefit from allowing nested <param> elements; <include> already does, because its content model is ANY; it should probably be added to <image>. So that part of Eliot's message I agree with.)

It's true that it is possible to implement #5 as #1 using some sort of external processing, either implicit in the processing behind the URL at which the data is referenced - href="path/to/api?query=foobar&format=dita" - or by specifying the transformation via parameters/metadata on the reference. But I worry about the complexity of requiring a complicated key definition, plus dependency on arbitrary external systems, plus the use of conkeyref (which, rightly or wrongly, scares a lot of people) to implement "insert file.csv as a table here." I think that complexity is going to limit adoption, and I think adoption needs to be encouraged. I think <include href="file.csv" parse="csv-as-simpletable"/> just makes sense in a way <simpletable conkeyref="keyToCsvFileWithProcessingMetadata/table"/> doesn't, especially to new users. My argument is usability and readability, arguably at the expense of architectural cleanliness. So for the time being, I'm standing by extensible @parse. It doesn't solve the problem of needing developers to implement a given @parse processor, but I don't think there's a way to avoid that without strictly limiting the formats that are allowed, and again, I think that's a bad idea.

That said, I do think there's value in being able to reference structured non-DITA data from a map, as well as embedding it inside topics. However, that would be a separate proposal not directly related to <include>. I'll send out a new Stage 1 proposal about this in a few minutes.

Chris

On 3/6/18, 5:21 PM, "dita@lists.oasis-open.org on behalf of Eliot Kimber" <dita@lists.oasis-open.org on behalf of ekimber@contrext.com> wrote:

In thinking through my position on the inclusion proposal I think it comes down to the need to make the following distinctions for things used by reference:

1. Things used by reference (conref or topicref) that are normal DITA markup, either because they were originally authored that way or because they are served as DITA markup by the URI resolver that resolves the URI for the reference. This is what I think of as "using conref for dynamic data" but is really just "when a conref or topicref with format of "dita" or "ditamap" specifies a URI, what comes back *must* be normal DITA markup." How that DITA markup came to be created is immaterial. The point is that the DITA processor asks for a URI to be resolved in a context that requires DITA markup to be the result and it gets DITA markup. That's all it needs to know.

2. Things used by reference that are in a context where literal text is expected, e.g., within <lines> or a specialization thereof. This is the coderef case and @parse of "text" in Chris' proposal.

3. Things used by reference that are within a <foreign> context where non-DITA XML markup is expected. This is the mathml and svgref case and @parse of "xml" in Chris' proposal.

4. Things used by reference that are in a normal DITA context but are not themselves text, DITA markup, or foreign markup. This is <image> and <object> in DITA 1.x.

It is case (4) that I think Chris was concerned about and about which I have concerns in terms of what we can or can't standardize.

Chris has tried to generalize inclusion generally to handle cases (2) and (3), which I agree is appropriate to do.

That then leaves case (4). I think this is an important case and we need to think it through a bit more deeply.

The ability to do case (1) is inherent in the DITA design because all references in DITA are via URI and the DITA standard does not (and cannot or should not) limit the nature of the URIs you use. That means you're free to use any URI you like as long as there's a resolver for it, which is an implementation detail. There's no need for the DITA standard to try to codify any particular way of binding to dynamically-constructed data because there are simply too many ways you might usefully do it.

Because case (1) is inherent in the DITA design I would be concerned about anything in the <include> design that appeared to provide another way to do conref or topicrefs to dynamically-constructed normal DITA markup.

Like case (1), cases (2) and (3) can be defined entirely in terms of the effective result: it is identical to having included the referenced content directly in the referencing topic's source. The difference between this and conref is *when* a DITA processor is obligated to do the referencing: unlike conref, there can't be any key- or ID-related processing complexity, so the resolution can be done at any time in the processing flow--the result must always be the same. Resolution could be done as part of some preprocessing step or it could be done during final-form processing, it shouldn't matter.

However, both Chris and Robert correctly pointed out that there could be a need for additional *parameters* to guide the resolution processing. Robert said that OT has already extended coderef in several useful ways.

By the same token, a URI-based dynamic conref or topicref mechanism might also need parameters. While parameters can always be embedded directly in URIs one way or another, it might be (and probably usually is) more convenient to be able to specify parameters separate from the URI itself.

Finally, considering case (4), non-XML, non-DITA, non-text content used in more or less any context, there are two current instances: <image> and <object>. Of these, one provides no way to specify parameters outside the URI (<image>) and one includes parameters (<object>).

This analysis leads me to ask two questions in the context of the <include> proposal:

1. Should <include> also be generalizing case (4)?
2. Should we be adding a general "parameters that apply to URI resolution" mechanism to DITA 2.x (separate from but used by the <include> mechanism)?

If the answer to question (1) is "yes" I think it might mesh well with my to-be-delivered proposal for reworking <image>. In TC discussion we already established the value of having an element that does just the reference to the image resource without having any nested content (<alt>). Such an element would be structurally identical to and semantically close to Chris' <include> element if <include> allowed a third value for @parse that meant "not text and not foreign XML" ("object", perhaps?).

I think the answer to question (2) is "yes", especially if the answer to question (1) is "yes".

A general parameters-for-uri-resolution mechanism would be valuable in any context where the URI to be resolved is not a simple direct reference to a static content. This would include both custom URIs used in case (1) as well as all the references used for cases (2), (3), and (4).

It would also make it clear that specifying parameters to URI resolution is not limited to just the <include> cases but can also apply to conref and topicref as well, using a single parameter specification mechanism. That would help keep the purpose of <include> vs conref vs topicref clear and avoid having <include> become, or try to become, an alternative way to do dynamic conref.

That would provide a general mechanism for *specifying* parameters without the DITA standard having to say anything about what those parameters might be.

I'm thinking of something like a specialization of <data> that means "these are parameters to the URI specified by the containing element". It wouldn't need to be much more specific than that--the main point is to simply signal unambiguously "URI resolution parameters go here". The details would still be processor specific, modulo any obviously general parameters we might identify (for example, parameters related to authentication or MIME types). To enable the easy case we might also add a new specialization base attribute, e.g. "@parameters", that would allow attribute-only parameter specification as well. Or maybe a new base attribute is all we would need.

I think a general parameter mechanism is consistent with what Chris is trying to do with allowing additional parse values but without having
it be limited to just the <include> use case and being general enough to satisfy all possible use cases for parameter specification (which attributes alone cannot do).

Cheers,

--
Eliot Kimber
http://contrext.com

---------------------------------------------------------------------
To unsubscribe from this mail list, you must leave the OASIS TC that
generates this mail. Follow this link to all your TCs in OASIS at:
https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php

Follow-Ups:
- Re: [dita] Conref Vs. Transclusion and Using Non-DITA Data As DITA
  - From: "Robert D Anderson" <robander@us.ibm.com>