dita message

Subject: Re: [dita] Re: Chunking and Composite Topics
From: Chris Nitchie <chris.nitchie@oberontech.com>
To: Kristen James Eberlein <kris@eberleinconsulting.com>, Noz Urbina <noz.urbina@mekon.com>, "dita@lists.oasis-open.org" <dita@lists.oasis-open.org>
Date: Mon, 2 Dec 2013 20:05:13 +0000
I spent some time over the holiday weekend reviewing chunking. The
problem, as I see it, isn’t with the topic, per se; it’s the fact that the
chunking attribute itself is extremely problematic. The default tokens are
vague and difficult to remember, its functionality is based on assumptions
that don’t apply for all processors, and most importantly, it is so
overloaded as to be almost indescribable.

Fundamentally, it’s responsible for two use cases:

1. Customizing the behavior of references to subsets of compound
topics/ditabases.
2. Combining content referenced by ’this’ topicref and child topicrefs
into a single output chunk.

The spec breaks this down further, into three things:

A. Selecting topics (select-*)
B. Splitting of those topics into chunks (as-*)
C. ‘Rendering’ the map branch (to-*)

I’d argue that (A) and (B) are different aspects of use-case (1), and (C)
is use-case (2), though you wouldn’t know it from the current spec
language. It’s not quite that clean, though, because the to-* tokens, as
far as I can tell, play double duty, controlling both the combining of
child topics, as well as informing the results of the selection performed
by select-*.

For example, when I reference ditabase.dita#TopicC

* By default (with the OTK), the navigation points to TopicC, but all of
ditabase.dita is rendered as a single chunk.
* chunk=“by-topic” or “select-topic to-content” will extract only TopicC
into its own chunk.
* chunk=“select-branch to-content” will extract TopicC and its children
into a single chunk.
* chunk=“select-branch by-topic” will extract TopicC and its children,
each to its own chunk (though specifying ’to-content’ appears to override
‘by-topic’).

And so on. I’ve actually started putting together a cheat-sheet based on
trial and error, because there’s no way I can keep all the different
combinations in my head.

Meanwhile, the only values that really matter when dealing with a parent
trying to combine/split its children are the to-* tokens, to-navigation
and to-content. The to-content value combines the topicref and its
children into one chunk, to-navigation... doesn’t. I’m frankly mystified
as to what to-navigation is supposed to do, and I’ve been at this for
hours. The spec isn’t much help. It says something about “navigation
chunks” but never really defines what that means, except in a
parenthetical that I’m having trouble making sense of.

Re: default behavior. The spec more or less explicitly states that there
is no spec-mandated default behavior, so a processor is free to chunk
using the select-branch to-content algorithm suggested by Noz, and I know
of at least one implementation that does. Sort of. (Arbortext selects the
branch and throws the rest away, but from there, chunking/ToC generation
is controlled by the stylesheet.)

I think a lot of the complexity/confusion here stems from the fact that
the OTK does its best to have one output chunk per input topic/ditabase
file, but the ‘chunk’ attribute allows you to tweak that. The spec is
operating on the assumption that all DITA processing attempts to optimize
along similar lines (there’s a similar issue with @copy-to), but nowhere
does the spec (as far as I know) *mandate* this behavior. I’ve always
found that optimization problematic because metadata from the topicref can
cascade into the topic, and so it’s very difficult to determine equality
between two topicrefs, even when they’re to the same URI. As we introduce
more features like scoped keys and branch filtering, this problem will
continue to get worse. Post-1.3, I think we need to start moving away from
that implicit one-to-one input topic/output chunk assumption in the spec,
and move towards a paradigm where each (non-resource-only local dita)
topicref represents its own output unit.

So for 1.3, I think we need to revisit the language describing
to-navigation. The other spec-specified values are pretty good, taken in
isolation; the challenge comes when trying to think through how different
combinations of values might affect output, and it’s in the combinations
that the real value lies. The existing examples are good, though I’d
suggest adding a simple ‘select a branch from a ditabase’ example as #3;
#1 is simple chunking, #2 is simple bursting, and then #3 jumps into
nested chunking, so a simple branch-selection example might help ease
people in. Other than that, though, I’m not sure how much we can do.

Post-1.3, I think we should consider deprecating the ‘chunk’ attribute
altogether and replacing it with more fine-grained control attributes.
Just off the top of my head (pseudo-DTD):

<!ATTLIST chunk-replacement “
  topic-selection (topic|branch|all) <!-- DEFAULT ‘all’ —>
  topic-split (yes|no) <!-- DEFAULT ’no’ -->
  topic-merge (yes|no) <!-- DEFAULT ’no’ -->
  topic-nav (per-topic|first-topic) <!-- DEFAULT ‘first-topic’ -->
“>

* topic-selection controls what amount of the referenced file is
considered the content unit. CDATA and extensible.
* topic-split indicates whether to break up the selected content unit into
individual output chunks.
* topic-merge specifies whether to combine the referenced content unit (or
pseudo-content-unit for topicheads) with content referenced beneath it
into a single chunk. Topic-merge takes precedence over topic-chunk.
* topic-nav controls whether navigation/ToC entries are generated for
nested topics in the logical content unit resulting from topic-selection
and topic-merge (and possibly to what depth). As an alternative, we could
extend @toc. CDATA and extensible.

I think this enables everything currently possible using the ‘chunk’
attribute, and the specified defaults map to the current default OTK
behavior. It also allows something that I couldn’t get working in the OTK
without multiple topicrefs, namely, including a compound topic as a single
chunk with multiple TOC entries. Splitting the functions of the chunk
attribute each into their own more specific, fine-grained attributes
would, I think, make life easier for just about everybody.

Chris

Chris Nitchie
(734) 330-2978
chris.nitchie@oberontech.com
www.oberontech.com
 <http://www.oberontech.com/>
Follow us:
 <https://www.facebook.com/oberontech>
 <https://twitter.com/oberontech>
 <http://www.linkedin.com/company/oberon-technologies>
 
 






From:  Kristen James Eberlein <kris@eberleinconsulting.com>
Date:  Tuesday, November 5, 2013 at 12:37 PM
To:  Noz Urbina <noz.urbina@mekon.com>, "dita@lists.oasis-open.org"
<dita@lists.oasis-open.org>
Cc:  Mark Poston <mark.poston@mekon.com>, Rob Hanna <rob@infoarchitects.ca>
Subject:  [dita] Re: Chunking and Composite Topics


Hi, Noz. (And Mark and Rob by cc)

We talked about this briefly at today's TC meeting. While we cannot make
any changes to chunking for DITA 1.3 -- the deadline for new proposals is
long past -- I asked for volunteers to review the current content in the
spec and make suggestions for improvement.

And I got volunteers; Stan Doherty (Mathworks) and Chris Nitchie (Oberon
Technologies) are on the hook for that work :)

Best,
Kris

Kristen James Eberlein
Principal consultant, Eberlein Consulting
Co-chair, OASIS DITA Technical Committee
Charter member, OASIS DITA Adoption Committee
www.eberleinconsulting.com <http://www.eberleinconsulting.com>
+1 919 682-2290; kriseberlein (skype)


On 11/1/2013 8:47 AM, Noz Urbina wrote:


Hello All,
 
Kristen asked me to submit my recent work on the Chunking and Composite
topic functions of DITA.  With my colleagues Mark Poston and Rob Hanna we
have been experimenting trying to use maps to leverage content that’s been
either created
 in or converted to composite topics.
 
This email contains is an almost-copy-and-paste from our report to the
client, but I’d also like to add my own (hastily put together) commentary.
 

 
<rant>
I find the chunking attribute syntax vastly overcomplicated. Instead of
offering a good default that’s simply achieved, it offers something that’s
expensive for vendors to implement and/or difficult to edit by hand. I
have worked with the
 usual main players - FrameMaker, XMetaL, oXygen, Arbortext editor – and
none offer any help or special functions around chunking.  It’s only an
advanced-user feature, and so it doesn’t really help move licenses for
people getting started, and it requires quite
 a lot of UI to make usable.  And the documentation in the spec is just a
series of examples that don’t have full XML sets shown, just partial ones
with prose description of what should happen on output.
 
Training on the  functionality is a nightmare and I have actually had to
look up the spec in a course when asked a question because the various
permutations are so many and the tools do nothing to help.
 
I would suggest that there are two use cases being addressed with the
chunking attribute, one is merging files together, the other is reusing
them from files that are used together.  This may be overloading the
attribute. 

 
The merging functionality makes sense, but the reuse/splitting options are
rather opaque.  I’d suggest by changing some of the default behaviours
this could be made much easier.
 
My own take would be:
 
From a map, if you specify a child topic of a multi-topic file, then it’s
safe to assume that that’s the topic you want, and not anything above
(this is how most things in XML work, so it follows logically).  So the
default meanings could
 be:
 
<topicref href="noz-test.dita"> = “All topics in this file”
<topicref href="noz-test.dita#id1a"> = “All topics from topic id1a down”
<topicref href="noz-test.dita#id1a" chunk="select-topic"> = “topic id1a
only” (although it’s highly debatable whether this should be called using
an attribute called “chunk” at all).
 
In a CCMS that uses IDs, there should be no change, you just split on the
# like usual.
 
I’d suggest that simplifying the parameters passed to @chunk would enable
more users to take advantage of it.  I’m sure many are, but because of the
complexity, lack of tool support, and resulting difficult to use for
beginners, I believe
 many aren’t Googling the spec and learning how to use it.
</rant>
 
<reportextract>
 
Reusing topics from a ditabase topicIf one uses chunking and conditions on
the topicrefs then you can conditionally filter topics in and out and
rearrange their hierarchy, even though they are stored in ditabase topics.

To reuse a topic from a ditabase topic:

1.      
Specify the topic id in the map and set the chunking attribute to
“to-content select-topic” to insert a single topic or “to-content
select-branch” or a topic and its descendants.

An example is supplied below of a DITAbase-based file being split up and
reordered.
File noz-test.dita
<!DOCTYPE dita PUBLIC "-//OASIS//DTD DITA Composite//EN" "ditabase.dtd">
<dita>
  <topic id="id1">
    <title>Topic 1</title>
    <body>
      <p>Topic 1.</p>
      <p>Topic 1 has a cross reference to <xref href="#id1a">Topic
1a</xref>.</p>
      <p>Topic 1 has a cross reference to <xref href="#id1b">Topic
1b</xref>.</p>
    </body>
    <topic id="id1a">
      <title>Topic 1a</title>
      <body>
        <p>Topic 1a has a cross reference to <xref href="#id1">Topic
1</xref>.</p>
        <p>Topic 1a has a cross reference to <xref href="#id1b">Topic
1b</xref>.</p>
      </body>
      <topic id="id1b">
        <title>Topic 1b</title>
        <body>
          <p>Topic 1b has a cross reference to <xref href="#id1">Topic
1</xref>.</p>
          <p>Topic 1b has a cross reference to <xref href="#id1a">Topic
1a</xref>.</p>
 
        </body>
      </topic>
    </topic>
 </topic>
</dita>
Map
<!DOCTYPE map PUBLIC "-//OASIS//DTD DITA Map//EN" "map.dtd">
<map>
<title>DITA Topic Map</title>
<topicref href="noz-test.dita#id1b" chunk="to-content select-topic">
  <topicref href="noz-test.dita#id1" chunk="to-content select-topic"
audience=”customerABC”/>
</topicref>
<topicref href="noz-test.dita#id1a" chunk="to-content select-topic"/>
<reltable>
  <relrow>
   <relcell>
    <topicref href="noz-test.dita#id1b"/>
   </relcell>
   <relcell collection-type="sequence">
    <topicref href="noz-test.dita#id1"/>
    <topicref href="noz-test.dita#id1a"/>
   </relcell>
  </relrow>
</reltable>
</map>
Note:
·        
There appears to be a bug in the DITA OT that prevents rendering of topics
with mixed topic types.  All topics must be of the same type or else the
transformation fails. The bug in the DITA OT is most
 likely in the Java extensions in the OT, not the XSLT.  It should not be
- if this is the only problem – particularly difficult to debug. Infineon
must decide whether to:
o  
Fix the bug
o  
Make topics all the same type (most logically this would be all <topic>,
within ditabase files. If this is done, as users and content are being
migrated to the new, more modular way of working the topic
 types can and should be applied on individual topics.
o  
Not reuse below the topic level for now.
·        
The same limitations on xrefs apply with composite as with regular topics,
and the same risks of broken links.
Limitations of composite topic type·
Simplified task is not included in the ditabase DTD. Ditabase DTD requires
additional specialization to include simplified task.
·        
Composite files will only be able to be categorised as a whole in the
taxonomy.  As they are burst, the topics contained will have to be
categorised after they
 are created.
·        
All IDs need to be unique across all topics – not just unique within a
topic.
·        
Additional stylesheet work may be required to achieve publishing features
such as mini-tables of contents (or forward organizers).
·        
Whole assemblies must be versioned with any change to a topic rather than
simply versioning a single topic.
·        
Topic-type OT bug as described above.
 
 
</reportextract>
 
<thanks>
To you all for your attention.
</thanks>
 
B. Noz Urbina–
Business Development Manager
bloghttp://lessworkmoreflow.blogspot.com
<http://lessworkmoreflow.blogspot.com/> ¦
twitter@nozurbina
enoz.urbina@mekon.com <mailto:julian.murfitt@mekon.com> ¦UK mob +44
(0)7739 522 002 ¦ES mob +34 625 467 866 ¦skype nozskype

 




--------------------------------------------------------------------- To
unsubscribe from this mail list, you must leave the OASIS TC that
generates this mail. Follow this link to all your TCs in OASIS at:
https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php