Re: [dita] Conref Vs. Transclusion and Using Non-DITA Data As DITA

I don’t have time at the moment to read carefully through Chris’ note but I will observe that in any scenario where non-DITA source data is being treated as though it was data there will have to be some processor-specific processing applied and thus there will be processor-specific stuff in the DITA source, whether it’s a magic URI or something else.

That is, there’s nothing we can do *in the source* to change that part of the problem and therefore nothing we can do to make processing of that kind of data more interchangeable—it will always require processor-specific stuff, however that gets configured.

One advantage of my magic URI approach is that it can be replaced with a non-magic version simply by statically generating the required DITA markup and changing the URI (which presumably you’ve isolated in a key definition). Nothing in the rest of the source document is effective. The disadvantage is that the fact that magic is required is signaled by the magic URI, which might be too obscure.

One advantage of Chris’ approach of using <include> with some signaling value of @parse is that it’s clear to an observer *of the source* that “magic goes here”. The disadvantage is that if you want to replace the magic with a reference to normal source you have to completely change the source from using <include> to using normal markup with conref.

I don’t see Chris’ approach as being more or less interchangeable than the magic URI approach, it’s only more obvious that configuration of the tools is required.

Which of these two sets of advantages and disadvantages is the more compelling I’m not sure.

I think we can also agree that the need for a general parameter mechanism is a constant for any approach—there has to be a way to configure the magic invocation in the source, regardless of what form that invocation takes.

I like the URI-based approach because it is general over all URI references, meaning it applies equally well to conref, topicref, and a new include element. It doesn’t require any architectural change to DITA.

But I suspect that most people would prefer the more obvious magic signaling in Chris’ proposal, even though it potentially requires more work to undo.

I’m also not keen on having too much direct reference to specific implementation components (e.g., transforms) in document source—that is definitely counter to XML practice generally and DITA practice in particular. I understand and appreciate the desire for a more-or-less standard way to configure magic, but we have to be careful about having that be standardized in appropriately.

Cheers,

Eliot Kimber

http://contrext.com

From: Robert D Anderson <robander@us.ibm.com>
Date: Wednesday, March 7, 2018 at 12:29 PM
To: Chris Nitchie <chris.nitchie@oberontech.com>
Cc: "dita@lists.oasis-open.org" <dita@lists.oasis-open.org>, Eliot Kimber <ekimber@contrext.com>
Subject: Re: [dita] Conref Vs. Transclusion and Using Non-DITA Data As DITA

During the discussion yesterday I tried to explain my thoughts on the include proposal, which didn't come out clearly because Chris had to ask me afterwards which side I was on.

The problem was that I wasn't really on a "side"; I was just trying to explain why I thought about the different cases of referencing non-DITA content. Basically, I see a distinction between references to DITA and to non-DITA content, regardless of how you end up handling that content when trying to resolve such references.

I like what I read below. For the most part, it reads (to me) like a much clearer version of what I was trying to say on the call.

I'm also *really* entertained by the idea of using parse="¯\_(ツ)_/¯" in a spec example, even while knowing it won't actually happen.

Regards, Robert D. Anderson DITA-OT lead and Co-editor DITA 1.3 specification, Digital Services Group

E-mail: robander@us.ibm.com Digital Services Group
11501 BURNET RD,, TX, 78758-3400, AUSTIN, USA

Chris Nitchie ---03/07/2018 10:33:05 AM---Thanks for the thoughtful feedback, Eliot. I spent the better part of yesterday afternoon mulling th

From: Chris Nitchie <chris.nitchie@oberontech.com>
To: Eliot Kimber <ekimber@contrext.com>, "dita@lists.oasis-open.org" <dita@lists.oasis-open.org>
Date: 03/07/2018 10:33 AM
Subject: Re: [dita] Conref Vs. Transclusion and Using Non-DITA Data As DITA
Sent by: <dita@lists.oasis-open.org>

Thanks for the thoughtful feedback, Eliot. I spent the better part of yesterday afternoon mulling this stuff over, and it very much helped me clarify my thoughts.

What I'm concerned with is the transclusion of content that is structured, but not DITA, into a DITA context. It may be loosely structured - markdown, HTML, Word (shudder) - or rigidly structured but non-XML - CSV, JSON, YAML - or some foreign XML grammar that can coerced into a DITA format via transform - WSDL, TLD, Docbook - but there's structure to be found.

I have an agenda here. I firmly believe that if DITA can't find a way to marry documentation authored in other formats with its native structured content, it's going to struggle to stay relevant. Lightweight DITA, with support for format="mdita" and format="hdita", is a giant leap forward on this front. The extensible @parse attribute is an attempt to create one avenue for it in full DITA.

At one point in the below, Eliot suggests that I'm concerned with his case #4 (<image>/<object>), but I think media references are fundamentally different from structured content transclusion references.

- Media references are designed such that their ultimate presentation is managed by a downstream rendering application - generally a web browser or a PDF engine - whereas I'm envisioning resolution as part of DITA preprocessing, such that the original inclusion directive is replaced by the referenced content before any stylesheet is involved.

- Media references are designed primarily (though I suppose not exclusively) for the inclusion of resources that are graphical in nature, or presented with some sort of interactive graphical interface; hence the @width, @height, @scale, etc. attributes. I'm concerned with content that, after resolution, is textual in nature, even if the text is tagged one way or another.

As I understand Eliot's argument, he's asserting that structured non-DITA *should* be treated as case 1 (DITA in a DITA context) with some sort of external transformation. In contrast, I think structured non-DITA content in a DITA context is a separate use-case (Case 5, if you will).

1. DITA in a DITA context (@conref/@conkeyref)

2. Text in a preformatted DITA context (<include parse="text">)

3. Non-DITA XML in a foreign context (<include parse="xml">)

4. Media references (<image>/<object>)

5. Structured non-DITA content in a DITA context (<include parse="¯\_(ツ)_/¯">)

(All of these, with the possible exception of #1, could potentially benefit from allowing nested <param> elements; <include> already does, because its content model is ANY; it should probably be added to <image>. So that part of Eliot's message I agree with.)

It's true that it is possible to implement #5 as #1 using some sort of external processing, either implicit in the processing behind the URL at which the data is referenced - href="" - or by specifying the transformation via parameters/metadata on the reference. But I worry about the complexity of requiring a complicated key definition, plus dependency on arbitrary external systems, plus the use of conkeyref (which, rightly or wrongly, scares a lot of people) to implement "insert file.csv as a table here." I think that complexity is going to limit adoption, and I think adoption needs to be encouraged. I think <include href="" parse="csv-as-simpletable"/> just makes sense in a way <simpletable conkeyref="keyToCsvFileWithProcessingMetadata/table"/> doesn't, especially to new users. My argument is usability and readability, arguably at the expense of architectural cleanliness. So for the time being, I'm standing by extensible @parse. It doesn't solve the problem of needing developers to implement a given @parse processor, but I don't think there's a way to avoid that without strictly limiting the formats that are allowed, and again, I think that's a bad idea.

That said, I do think there's value in being able to reference structured non-DITA data from a map, as well as embedding it inside topics. However, that would be a separate proposal not directly related to <include>. I'll send out a new Stage 1 proposal about this in a few minutes.

Chris

On 3/6/18, 5:21 PM, "dita@lists.oasis-open.org on behalf of Eliot Kimber" <dita@lists.oasis-open.org on behalf of ekimber@contrext.com> wrote:

In thinking through my position on the inclusion proposal I think it comes down to the need to make the following distinctions for things used by reference:

1. Things used by reference (conref or topicref) that are normal DITA markup, either because they were originally authored that way or because they are served as DITA markup by the URI resolver that resolves the URI for the reference. This is what I think of as "using conref for dynamic data" but is really just "when a conref or topicref with format of "dita" or "ditamap" specifies a URI, what comes back *must* be normal DITA markup." How that DITA markup came to be created is immaterial. The point is that the DITA processor asks for a URI to be resolved in a context that requires DITA markup to be the result and it gets DITA markup. That's all it needs to know.

2. Things used by reference that are in a context where literal text is expected, e.g., within <lines> or a specialization thereof. This is the coderef case and @parse of "text" in Chris' proposal.

3. Things used by reference that are within a <foreign> context where non-DITA XML markup is expected. This is the mathml and svgref case and @parse of "xml" in Chris' proposal.

4. Things used by reference that are in a normal DITA context but are not themselves text, DITA markup, or foreign markup. This is <image> and <object> in DITA 1.x.

It is case (4) that I think Chris was concerned about and about which I have concerns in terms of what we can or can't standardize.

Chris has tried to generalize inclusion generally to handle cases (2) and (3), which I agree is appropriate to do.

That then leaves case (4). I think this is an important case and we need to think it through a bit more deeply.

The ability to do case (1) is inherent in the DITA design because all references in DITA are via URI and the DITA standard does not (and cannot or should not) limit the nature of the URIs you use. That means you're free to use any URI you like as long as there's a resolver for it, which is an implementation detail. There's no need for the DITA standard to try to codify any particular way of binding to dynamically-constructed data because there are simply too many ways you might usefully do it.

Because case (1) is inherent in the DITA design I would be concerned about anything in the <include> design that appeared to provide another way to do conref or topicrefs to dynamically-constructed normal DITA markup.

Like case (1), cases (2) and (3) can be defined entirely in terms of the effective result: it is identical to having included the referenced content directly in the referencing topic's source. The difference between this and conref is *when* a DITA processor is obligated to do the referencing: unlike conref, there can't be any key- or ID-related processing complexity, so the resolution can be done at any time in the processing flow--the result must always be the same. Resolution could be done as part of some preprocessing step or it could be done during final-form processing, it shouldn't matter.

However, both Chris and Robert correctly pointed out that there could be a need for additional *parameters* to guide the resolution processing. Robert said that OT has already extended coderef in several useful ways.

By the same token, a URI-based dynamic conref or topicref mechanism might also need parameters. While parameters can always be embedded directly in URIs one way or another, it might be (and probably usually is) more convenient to be able to specify parameters separate from the URI itself.

Finally, considering case (4), non-XML, non-DITA, non-text content used in more or less any context, there are two current instances: <image> and <object>. Of these, one provides no way to specify parameters outside the URI (<image>) and one includes parameters (<object>).

This analysis leads me to ask two questions in the context of the <include> proposal:

1. Should <include> also be generalizing case (4)?

2. Should we be adding a general "parameters that apply to URI resolution" mechanism to DITA 2.x (separate from but used by the <include> mechanism)?

If the answer to question (1) is "yes" I think it might mesh well with my to-be-delivered proposal for reworking <image>. In TC discussion we already established the value of having an element that does just the reference to the image resource without having any nested content (<alt>). Such an element would be structurally identical to and semantically close to Chris' <include> element if <include> allowed a third value for @parse that meant "not text and not foreign XML" ("object", perhaps?).

I think the answer to question (2) is "yes", especially if the answer to question (1) is "yes".

A general parameters-for-uri-resolution mechanism would be valuable in any context where the URI to be resolved is not a simple direct reference to a static content. This would include both custom URIs used in case (1) as well as all the references used for cases (2), (3), and (4).

It would also make it clear that specifying parameters to URI resolution is not limited to just the <include> cases but can also apply to conref and topicref as well, using a single parameter specification mechanism. That would help keep the purpose of <include> vs conref vs topicref clear and avoid having <include> become, or try to become, an alternative way to do dynamic conref.

That would provide a general mechanism for *specifying* parameters without the DITA standard having to say anything about what those parameters might be.

I'm thinking of something like a specialization of <data> that means "these are parameters to the URI specified by the containing element". It wouldn't need to be much more specific than that--the main point is to simply signal unambiguously "URI resolution parameters go here". The details would still be processor specific, modulo any obviously general parameters we might identify (for example, parameters related to authentication or MIME types). To enable the easy case we might also add a new specialization base attribute, e.g. "@parameters", that would allow attribute-only parameter specification as well. Or maybe a new base attribute is all we would need.

I think a general parameter mechanism is consistent with what Chris is trying to do with allowing additional parse values but without having

it be limited to just the <include> use case and being general enough to satisfy all possible use cases for parameter specification (which attributes alone cannot do).

Cheers,

E.

--

Eliot Kimber

https://urldefense.proofpoint.com/v2/url?u=http-3A__contrext.com&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=eBihWDTS2oOizc2d2LrsewiOpn69Mu6rPwmmRMM_aAU&m=fgxCZbA2Zir3MxQ1EqLC5zTocz5nAzKd-I2wx-J9Jgw&s=wfOfBG-lzNofU2OcA-PS9A3OMichCFLJd_SqxWoXH7Q&e=

---------------------------------------------------------------------

To unsubscribe from this mail list, you must leave the OASIS TC that

generates this mail. Follow this link to all your TCs in OASIS at:

https://urldefense.proofpoint.com/v2/url?u=https-3A__www.oasis-2Dopen.org_apps_org_workgroup_portal_my-5Fworkgroups.php&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=eBihWDTS2oOizc2d2LrsewiOpn69Mu6rPwmmRMM_aAU&m=fgxCZbA2Zir3MxQ1EqLC5zTocz5nAzKd-I2wx-J9Jgw&s=ylNmrIW7EDJW9JfTG4gCYyO5xExyMzUyDgoDdnpDXzQ&e=

dita message