[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: Re: [dita] Stage 2 proposal: chunking redesign
I think this is largely correct. Robert D. Anderson Hi all, Simplify how the @chunk attribute is defined to 1) make it easier to use, and 2) make implementation easier and more reliable.
Date and version information N/A
Proposed solution The overall goal with this solution is to preserve (mostly intact) the two most useful existing cases for chunking.
Important:
In the specification, the current definition for all 6 valid values and 1 deprecated value will be removed. The will be replaced with the two values "combine" and "split".
This applies to all uses of the @chunk attribute; no elements will get the attribute that did not have it before, and no elements that had the attribute will have it removed. In the case where hello.dita contains 5 topics (either nested or peers within a <dita> element), the following markup would result in hello.dita being split into 5 individual documents. How the documents are handled at that point is up to the processor (in HTML5 output where one input file generally = one output file, this would turn hello.dita into five output files, presumably named after topic IDs within the original document). Note that the chunk="split" value has no impact on the nested reference notchunked.dita; in the resulting hierarchy, the reference to notchunked.dita should end up nested within the final topic split from hello.dita.
<map> In the case where hello.dita and world.dita each contain 5 topics each (either nested or peers within a <dita> element), the following markup would result in the two original documents being split into 10 individual documents, with the same handling caveats as above.
<map chunk="split"> Assume the very simple map below with a single topic simple.dita, and the contents of simple.dita are also shown.
<map> Assume the following map, where chunk="split" on the root element means that all topics within this map structure are split by default, but a branch within the map sets chunk="combine":
<map chunk="split"> Robert D. Anderson [Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
| [List Home]
For the (probably) easier question, about how nested topicref elements are reflected in the hierarchy - I struggled with this in the proposal, and really only picked my version because overall I was trying to simplify and that seemed the easiest to explain / understand. But I felt an equal pull towards Chris's suggestion, so I'm happy to go with that. I think we clearly need to state what the expected result is, but I intentionally used the SHOULD term so that there was an outlet if somebody had a solid reason to end up with a different hierarchy.
With Chris's solution, we also need to address how to handle documents that use the <dita> container; for example, if composite.dita has a root <dita> element with 5 child topics (A, B, C, D, and E), and any/all of those have their own children, where would "difficultChild.dita" end up?
<topicref href="">
<topicref href="">
</topicref>
My assumption would be: as a direct child of the final "root" topic in the composite file -- assuming E is the last child topic of <dita>, then difficultChild would end up as the last child topic of E.
For the issue of keys ... yeah. Part of me wants to say "If you need to independently address specific instances of a nested topic that's going to get chunked, that is incompatible with relationship tables." I don't really think that's allowed though.
I do worry that any rules for automating keys are a big potential point of confusion, and that scares me. Immediately coming to mind:
* What is the precedence of a key that is automatically constructed, but then becomes a duplicate of one already in the map?
* If keys are automated based on IDs of the actual nested topics, then you cannot actually know your "key space" until after you've read the topics, which seems wrong
But like Chris, I don't have any good answer for this.
Regards,
DITA-OT lead and Co-editor DITA 1.3 specification,
Digital Services Group E-mail: robander@us.ibm.com
Digital Services Group 11501 BURNET RD,, TX, 78758-3400, AUSTIN, USA
Chris Nitchie ---04/02/2018 08:53:24 AM---I thing ignoring chunking inside reltables is fine, and that content should be identified by specifi
From: Chris Nitchie <chris.nitchie@oberontech.com>
To: Robert D Anderson <robander@us.ibm.com>, DITA Technical Committee <dita@lists.oasis-open.org>
Date: 04/02/2018 08:53 AM
Subject: Re: [dita] Stage 2 proposal: chunking redesign
Sent by: <dita@lists.oasis-open.org>
I thing ignoring chunking inside reltables is fine, and that content should be identified by specific topic fragment identifiers, as I think you’re describing. But I think there’s something of a rabbit hole here I’m going to try to peek into.
<beleaguered-sigh>Keys.</beleaguered-sigh>
As I think we’ve discussed on the TC before, identifying entries in a relationship table by @href is somewhat problematic, as a given topic document may exist in multiple locations of the resolved map tree, with different scoped key bindings and parent/child/sibling relationships. As such, probably the best way to define relationship participants in a reltable is via key, thus referencing a specific instance of the desired topic.
In the case of chunk=”split”, the split-out chunks won’t have keys assigned to them. This is problematic for reltables but also for other garden-variety keyref-based linking. Unfortunately, I don’t really see a great way to accomplish that. My best suggestion would involve computing key names using the root ID of the topics in the referenced document combined with either the @chunk-bearing topicref’s key(s) (if any) or, less desirably, something in the <topicmeta> of that topicref.
On a separate note, this proposal states the following:
I think this is saying that if a map references a compound topic hierarchy such that the final topic is nested several levels deep, any nested topicrefs should be placed beneath that deeply-nested child. I think I’d rather they be placed as siblings of the first depth level, which is to say, as the last immediate child of the chunking topicref.
Example:
Map:
<topicref href="" chunk=”split”>
<topicref href="">
</topicref>
Compound.dita:
<topic id=”c1”>
<title>Topic 1</title>
<topic id=”c1.1”>
<title>Topic 1.1</title>
</topic>
<topic id=”c1.2”>
<title>Topic 1.2</title>
<topic id=”c1.2.1”>
<title>Topic 1.2.1</title>
</topic>
</topic>
</topic>
The resulting topicref hierarchy, according to this proposal, would be thus:
<topicref href="">
<topicref href="">
<topicref href="">
<topicref href="">
<topicref href="">
</topicref>
</topicref>
</topicref>
Whereas I’m arguing for:
<topicref href="">
<topicref href="">
<topicref href="">
<topicref href="">
</topicref>
<topicref href="">
</topicref>
Chris
From: <dita@lists.oasis-open.org> on behalf of Robert D Anderson <robander@us.ibm.com>
Date: Monday, March 26, 2018 at 10:27 AM
To: DITA Technical Committee <dita@lists.oasis-open.org>
Subject: [dita] Stage 2 proposal: chunking redesign
This one changed a bit more than expected from my original vision at stage 1, thanks to feedback from the initial TC discussion. Based on feedback from Stan's review of my first stage 2 draft, I've also changed the proposed chunk token for combining documents from my original idea ("merge") to one more closely aligned with the idea of combining documents (that is, "combine").
I expect that this one will probably result in a fair bit of discussion and possibly more changes. Looking forward to the discussion...
DITA 2.0 proposed feature #105: Redesign chunking
Include the following information:
Date that this feature proposal was completed
Champion of the proposal 14 March 2018
Links to any previous versions of the proposal Robert D Anderson
Links to minutes where this proposal was discussed at stage 1 and moved to stage 2 Stage 1 proposal 28 Feb 2018: https://lists.oasis-open.org/archives/dita/201802/msg00106.html
Links to e-mail discussion that resulted in new versions of the proposal https://www.oasis-open.org/committees/download.php/62726/minutes20180313.txt, with Eliot and Stan as reviewers
Link to the GitHub issue Original requirement or use casexxx
Redesign the chunk attribute for the following reasons / benefits:
Use cases
New terminology 1. The chunking function, as with features like @conref, is a DITA-defined operation related to processing DITA documents. As such, the specification can only declare the before and after state of all DITA documents that implement the feature, in the context of processing the documents for some other purpose. For example, a DITA document many.dita might be chunked into many topic documents during rendering, but (again like @conref) the before/after state still deals with the DITA content. Any examples that make use of published HTML file names are purely for illustration / ease of understanding.
2. Because the chunking operation is defined in terms of processing, the values below are not meant as tool operations on the source, such as "refactor my source to reflect these new chunks". The result of evaluating @chunk is no longer a source file, and does not need to exist as an actual file (it may be an object in memory somewhere).
3. This entire function is intended for situations where splitting or combining content is relevant, & where authors need control over how that happens. In nearly all cases, chunking will be irrelevant for monolithic publishing formats like PDF or EPUB. Likewise, published HTML is often multi-file and so typically makes use of chunking. However, neither of these is always the case – local style may dictate that PDFs are split at some level, or that HTML is always generated as a single file. As such, we need to be careful that the specification allows @chunk to be ignored when needed. This also means that the specification itself cannot know in advance when this is the case or for what formats this is the case – the best we can do is give examples of common cases.
These are the two operations people already think of or look for when they ask about chunking: the ability to publish many documents as if they were one, and the ability to publish one document as if it was many. To that end, the proposed solution is:
Benefits1. Remove all of the current @chunk token values (one value, to-navigation, is already deprecated).
2. Define one new value combine to handle the most common scenario, combining multiple DITA documents from a map into one while preserving the overall hierarchy of the map.
3. Define one new value split to handle the second most common scenario of splitting one DITA document into many.
4. When links exist to a topic that is chunked, applications will need to handle the link so that it resolves to the new combined or split context. If a chunking operation results in multiple instances of a result topic (either chunked separately, or some chunked and some not), applications may determine which result topic to target with the link.
5. This attribute should still be defined as CDATA, which would allow applications to define additional tokens, although I expect those will be rare. One potential advantage to this approach is that DITA 1.x tokens would still remain valid according to the parser (but ignored by 2.0 processors). I propose that we avoid some of the DITA 1.x confusion by stating that the attribute can only contain a single token (note this would mean some potential DITA 1.x values are no longer valid).
6. All remaining behaviors associated with DITA 1.x chunking are no longer supported by this attribute. The original tokens declared several unrelated behaviors using a single attribute. I suggest that if any of those other behaviors are still required, alternate attributes be defined to handle them. I do not intend to define those attributes as part of this proposal. That work should only be done if somebody has a strong need for the attributes.
Who will benefit from this feature?
What is the expected benefit? 1. Authors wishing to combine or split documents
2. Those trying to implement chunking in a processor
3. Maintainers of the DITA specification and of DITA tools who can now provide a clear explanation of the function
How many people probably will make use of this feature? 1. Chunking is easier to use
2. Chunking is easier to implement
3. Improved documentation (in the spec and elsewhere)
4. DITA is simplified by making the feature more intuitive and by removing features that are not used + make the simple case difficult
How much of a positive impact is expected for the users who will make use of the feature? Many, based on my own experience and based on the number of open defect reports against DITA-OT chunking
Technical requirementsSignificant improvement over the current feature
Renaming or refactoring elements and attributes
Processing impact Renaming or refactoring an attribute
The current attribute values are not defined in the grammar file, so the grammar definition does not change.
Overall usability
Backwards compatibilityThe chunking feature today is hard to use and hard to implement. This should address both concerns, resulting in a much more usable experience in all aspects of DITA chunking.
Was this change previously announced in an earlier version of DITA?
Changing the meaning of an element or attribute in a way that would disallow existing usage? No, although I have personally described this in public venues as one feature sure to be redesigned in DITA 2.0.
Migration planYes; for the most common uses (possibly for the only real-world uses), the migration path is clear.
Documents
Processors The easiest path is likely to use search/replace across DITA maps, and update chunking tokens to use the new value.
Might any existing specialization or constraint modules need to be migrated? Processors will need to be updated to handle the new tokens. As a first approach they could simply treat the new values the same way as older equivalent values, but I would expect that over time many tools will want to replace older chunking processes with new ones.
CostsUnlikely, although possible in theory. The only case where this could happen is if a module was designed to explicitly enumerate values for @chunk. In that case, the same modules would need to allow for the new tokens.
Maintainers of the grammar files
Editors of the DITA specification N/A
Vendors of tools
DITA community-at-large Tools that implement chunking can take a quick approach (interpret the new values exactly the same as old ones), which should have minimal cost. Alternatively, they may wish to rewrite the chunking process, which will have a larger cost (hard to specify exactly due to widely different tool scenarios).
Producing migration instructions or tools
Examples
Figure 1. Creating a single monolithic result document from a root map
<map chunk="combine">
<title>Previously this would have been chunk="to-content"</title>
<topicref href=""> <topicref href=""> ...
</map>
Figure 2. Creating multiple result documents from a single document
<title>Previously this would have used chunk="by-topic"</title>
<topicref href="" chunk="split">
<topicref href=""> </topicref>
<topicref href=""> ...
</map>
Figure 3. Creating multiple result documents from every source DITA document
<title>Previously this would have used chunk="by-topic"</title>
<topicref href=""> <topicref href=""> </topicref>
</map>
Figure 4. Explicit example of split topic with resulting hierarchy
<title>Very simple "split" example</title>
<topicref href=""></map>
simple.dita:
<topic id="a">
<title>Root topic</title>
<body>...</body>
<topic id="b">
<title>Sub-topic</title>
<body>...</body>
<topic id="c">
<title>sub-sub-topic</title>
<body>...</body>
</topic>
</topic>
<topic id="jumpup">
<title>another sub-topic</title>
<body>...</body>
</topic>
</topic>
The document simple.dita contains four topics; the chunking operation split effectively results in the following map, with each document containing only one topic. For this sample the file names are taken from the topic IDs for clarity but this is not required.
<map>
<title>Very simple "split" example</title>
<topicref href=""> <topicref href=""> <topicref href=""> </topicref>
<topicref href=""> </topicref>
</map>
Figure 5. "split" when used on a grouping element
Assume the following map, where chunk="split" is used on grouping elements:
<map>
<title>Groups are split</title>
<topicgroup chunk="split">
<topicref href=""> <topicref href=""> </topicgroup>
<topichead chunk="split">
<topicmeta><navtitle>Heading for a branch</navtitle></topicmeta>
<topicref href=""> <topicref href=""> </topichead>
</map>
Figure 6. "combine" when used on a grouping element
Assume the following map, where chunk="combine" is used on grouping elements:
<map>
<title>Groups are combined</title>
<topicgroup chunk="combine">
<topicref href=""> <topicref href=""> </topicgroup>
<topichead chunk="combine">
<topicmeta><navtitle>Heading for a branch</navtitle></topicmeta>
<topicref href=""> <topicref href=""> </topichead>
</map>
Figure 7. Edge case: "split" becomes "combine"
<title>Split most, but not one branch</title>
<topicref href=""> <topicref href="" chunk="combine">...</topicref>
<topicref href=""></topicref>
Assume as well that no other @chunk attributes are specified in this map. The following is true:
Figure 8. Edge case: ignoring "split" values within a combined branch 1. The document splitme.dita and all documents within that branch will be split apart if they contain more than one topic
2. Because of the chunk="combine" setting, the second branch with exception.dita at the root will result in a single result document
3. The document splitmetoo.dita and all documents within that branch will be split apart if they contain more than one topic
Assume the following map, where a branch is combined, but a nested <topicref> specifies "split":
<map>
<title>Ignoring split value</title>
<topicref href="" chunk="combine">
<topicref href=""> <topicref href="" chunk="split"/>
<topicref href=""> <topicref>
...
</map>
In this case: 1. The branch beginning with bigBranch.dita results in a single, combined document
2. In the combined document, the contents of iamhappy.dita, iamconfused.dita, and happyagain.dita are all peers within the final topic of bigBranch.dita
3. The chunk="split" value within the branch is ignoredRegards,
DITA-OT lead and Co-editor DITA 1.3 specification,
Digital Services Group E-mail: robander@us.ibm.com
Digital Services Group 11501 BURNET RD,, TX, 78758-3400, AUSTIN, USA