dita message

Subject: DITA 1.3 Processing Order Analysis

From: Eliot Kimber <ekimber@contrext.com>
To: dita <dita@lists.oasis-open.org>
Date: Tue, 02 May 2017 11:11:58 -0500

This is my analysis of how DITA 1.3 processing has to work in order to get the correct answer, at least in the DITA OT. But this analysis, in the abstract, would apply to any processor. This was originally posted to the DITA OT Wiki so it reflects some OT implementation details.

One of the key features of the processing described is that direct-href conrefs are resolved early while key-based conrefs are resolved later—it’s simply not possible to do all conref resolution in one phase—it needs to be performed at different points in the process. In particular you have to resolve direct conrefs within maps before you do anything else in order to then have a complete and correct resolved map from which you can then determine the key spaces and effective set of referenced resources. As described below you can also capture knowledge of what *will be* filtered out so that you can avoid unnecessary processing without actually removing filtered out things too early (e.g., before performing the final conref resolution steps).

Original post:

In the course of trying to implement copy-to customization I'm realizing that what I'm trying to do is similar to (and dependent on) implementation for DITA 1.3 branch filtering and scoped keys. Thus I've created this page to capture the general requirements and implementation implications for DITA 1.3 processing.

The current OT preprocessing cannot support the DITA 1.3 requirements as implemented because it does topic copying and filtering too early in the process, before either the full set of effective topics or actual filtering conditions for a given topic use instance are known. This will require significant changes to the current code, at least to the debug-and-filter process (and possibly generate-lists, although lists can be updated as needed). [Editorial note: this is referring to that part of the OT processing that determines the set of effective resources, including making copies of topics as required by authored @copy-to or as required by the existence of branch filters or key scopes. This is the processing area where the current filter-first or conref-first difference can occur.]

As far as I can work out, implementing branch filtering and scoped keys has to be done as follows, because of the information required and available at any given step.

The process as described does filtering after conref resolution. This is to ensure that applicable conrefs to inapplicable elements can be detected and reported rather than causing the conref to fail because the target as has already been filtered out (the main problem with the current filter-first approach).

However, it is wasteful and expensive to process elements that will subsequently be filtered out.

Thus, in order to implement this processing most efficiently there needs to be an "isEffective()" function that takes an element and its effective @props values (that is, potentially inherited from an ancestor) and determines if that element is effective based on the current active condition set. This shouldn't be hard to implement in XSLT or Java. It simply requires maintaining knowledge of the current effective conditions (reflecting any branch filtering, where applicable) and doing applicability evaluation on demand.

One way to do this would be to do a "filter" process that simply flags an element as being effective or ineffective but retains the element itself. An isEffective() check would then be trivial and subsequent filtering processing would be very simple. It would even allow for reporting of element applicability by not actually removing things filtered out and rendering them as some sort of report.

This approach to filtering then avoids the current problem with early filtering while avoiding unnecessary processing.

The process would be:

1. Resolve the map using only direct map references and resolve any map-to-map direct-reference conrefs.

2. Instantiate all filtered branches implied by ditavalrefs that are not themselves filtered out of the map (can use the isEffective() method to determine if a given filtered branch is itself effective, and if not ignore it, avoiding the need to do branch generation on branches that will be filtered out later). We can't do filtering at this point because there might be key-based conref targets that would get filtered out in advance of final conref resolution.

3. Expand all keys to reflect key scopes. Capture original keyref value and scope steps (so messages can reflect the key scope hierarchy and not just the expanded key [not all dot-separated tokens represent key scope names])

4. Construct the key spaces. Can choose to only include effective keys in the key space (using the isEffective() function to determine if a given key definition is filtered in when determining key definition precedence) or include all key definitions along with their applicability and whether or not they are effective.

For the OT, would expect to normally use pre-filtered key space set, but other processes might want to have the full keyspace with conditions, so it needs to be an option. Within a component that provides a general API for managing key spaces, it must be possible to know the applicability of any key definition or key space as a whole (where the key scope itself was conditional).

5. Resolve key-based conrefs from maps to maps (map-to-topic conrefs can't necessarily be resolved yet because copy-to processing hasn't yet been applied to the topics, so we don't know what the target element details might be, in particular, use-context-specific filtering that would make a conref target filtered out.) This should be filtering-aware so that conrefs to elements that would be filtered out are reported and not resolved (meaning that no elements are unnecessarily processed).

At this point we have a fully-resolved, unfiltered map. With this map, the processing steps are then:

1. Filter the map.

At this point, we have the final map structure and content, with filtering applied.

2. Do any metadata propagation within the map, or any other similar data denormalization (e.g., updating @href values to reflect resolved keyrefs, etc.)

3. Add @copy-to attributes to all second and subsequent topicrefs to topics referenced within branches with different filtering conditions.

At this point the map is fully resolved and filtered and the key spaces have been constructed.

4. Allow additional preprocessing of the map (e.g., copy-to adjustment)

At this point the set of effective resources (topics and non-DITA resources) is known (because the map now reflects all desired copy-tos).

5. Determine the set of resources required by the map and what file(s) they must be copied to.

6. Make copies of topics reflecting @copy-to values. Can apply non-destructive filtering as described above (meaning no point in copying resources you know will ultimately be filtered out).

7. Resolve conrefs in topics. Again, can use isEffective() to avoid processing elements that will be filtered out and can report conrefs to inapplicable targets.

8. Apply filtering to topics

At this point all topic copies have been created, conrefs have been resolved, and filtering applied. The topics are ready for any additional processing, such as link generation, etc., as well as final deliverable production.

The remaining steps of the current preprocess pipeline should work normally (that is, everything that follows mappull today).

Cheers,

--
Eliot Kimber
http://contrext.com