docbook-tc message

Subject: Re: DocBook TC action item reminders

From: Scott Hudson <scott.hudson@flatironssolutions.com>
To: DocBook Technical Committee <docbook-tc@lists.oasis-open.org>
Date: Sun, 15 Mar 2009 21:14:45 -0600

Folks,

please find a proposal for modular DocBook attached. Special thanks to 
Jim Earley for writing the bulk of the first draft, and for the 
additional collaboration of Larry and Dick.

Best regards,

--Scott

Scott Hudson
Senior XML Architect

e: scott.hudson@FlatironsSolutions.com
O: 303.542.2146
C: 303.332.1883
F: 303.544.0522

http://www.FlatironsSolutions.com
Vision. Experience. Engineering Excellence.



Bob Stayton wrote:
> Hi,
> This is a repeat of last month's reminder mail since we did not meet in 
> February.
>
> Next meeting is Wednesday March 18.
>
> Bob Stayton
> Sagehill Enterprises
> bobs@sagehill.net
>
>
> ----- Original Message ----- 
> From: "Bob Stayton" <bobs@sagehill.net>
> To: "Jirka Kosek" <jirka@kosek.cz>; "Scott Hudson" 
> <scott.hudson@flatironssolutions.com>; "Larry Rowland" 
> <larry.rowland@hp.com>; "Norm Walsh" <ndw@nwalsh.com>; "Bob Stayton" 
> <bobs@sagehill.net>
> Sent: Thursday, February 12, 2009 12:13 AM
> Subject: DocBook TC action item reminders
>
>
>   
>> If you are getting this mail, then you have an
>> action item in the list below, and this is your friendly
>> reminder service before the next meeting.
>> If you have already completed your action items, then
>> good for you!
>>
>> Next meeting is **Tuesday** 17 February 2009.
>>
>>
>> Actions:
>>
>>  a. Bob to organize TDG reading after names are fixed.
>>
>>  b. Norm to write up a backwards compatibility policy document.
>>
>>  c. Norm to incorporate group parameter change (RFE 1998852) into the 
>> schema
>>     for 5.1.
>>
>>  d. Norm to ask mailing list about 'rep' on methodparam.
>>
>>  e. Norm to update OASIS site for 5.0 spec and schema.
>>
>>  f. Norm to update spec to include public and system identifiers
>>     for the 5.0 DTD version.
>>
>>  g. Jirka to add schema comparison table to DocBook 5.0 Transition Guide.
>>
>>  h. Norm to add floatstyle attribute to sidebar for 5.1.
>>
>>  i. Norm to write up proposed content model for initializer.
>>
>>  j. Norm to add subtitle to sidebar for 5.1.
>>
>>  k. Norm to determine OASIS requirements for charter updates.
>>
>>  l. Norm to solicit for a third DocBook 5 user again.
>>
>>  m. Norm to work with Mary to officially adopt the new charter at OASIS.
>>
>>  n. Norm to work with Mary to make Publishing Subcommittee
>>     schema a Committee Working Draft.
>>
>>  o. Norm to work with Keith and Scott to update the OASIS committee
>>     site to make the Publishing Subcommittee Working Draft
>>     publicly available.
>>
>>  p. Scott to write up a modular DocBook proposal for the TC
>>     to discuss.
>>
>>  q. Scott to append suggestions to RFE 1722935 from the
>>     Publishing Subcommittee regarding additional class values.
>>
>>  r. Larry to write additional documentation for the existing name
>>     elements describing how they are best used in different locales.
>>
>>
>> Bob Stayton
>> Sagehill Enterprises
>> bobs@sagehill.net
>>
>>
>>     
>
>

MODULAR DOCBOOK PROPOSAL 

1. Overview 

DocBook has long been the standard for creating technical publications in SGML and XML.
The standard has a rich, comprehensive element set capable of handling most
structural and semantic markup that can be found in technical documentation and can produce 
a wide variety of output formats.  

In recent years, industry trends have begun to emphasize more modular authoring
processes.  Many factors are driving content creation in this direction:
    
    - more distributed authoring: authors are responsible for specific content
      areas rather than whole manuals.  Content could be authored by many 
      different authors, even some in different organizations altogether.
      
    - content reuse: This has long been a "holy grail" of information 
      architects:  write content once, reuse in many different contexts
      
    - change management:  isolate the content that has changed.  This is a key
      driver for companies that have localization needs.  By modularizing their
      content, they can drive down costs by targeting only the changed content 
      for translation.
      
In addition to the core business drivers, there are additional downstream
opportunities for modularized content:

    - dynamic content assembly:  create "publications" on the fly using an
      external assembly file that identifies the sequence and hierarchy of 
      modular components rather than creating a single canonical instance.
      
In the 2000's DITA was introduced as a new OASIS XML standard that leveraged
a modular design where content is created and stored as individual "topic" 
files and assembled into publications using "maps".  Since then, interest in 
the standard has grown substantially, particularly because of the modular 
features.   

In the same vein, DocBook still retains a strong and large community of users
with a significant investment in tools and processes primarily focused on 
delivering content from DocBook and its many variants.  Nonetheless, there is
growing interest from the community to have some of the same modular features
found in DITA built into DocBook.

2. A Modular DocBook Design

To support a more modular architecture within DocBook, we need to account for
the numerous structural elements currently defined in the grammar.  DocBook's
rich design was principally focused on the creation of printed content such
as books and whitepapers, which is reflected in elements such as:
   
   - set
   - book
   - part
   - chapter
   - appendix
   - reference
   - article

These structures supported a diverse set of lower-level hierachical elements that organize
content in a logical way:

    - sect*
    - section
    - refentry
    - biblioentry
    
From a modular content design, any component-level and container-level elements
could reasonably be a logical unit of information.   Because of this, the basic
modular architecture must be more flexible than DITA's, which has only one 
logical unit of information - the topic (and its specializations).  Subsequently,
a collection of DocBook components can be much more structurally diverse than
 DITA topics in a map.  As a result, we differentiate from DITA's map 
semantics with the introduction a new element: <assembly>.

3.  The <assembly> Element

The <assembly> element is the root-level element that defines the 
resources, hierarchy, and relationships for a collection of DocBook components.
An <assembly> can be the structural equivalent of any DocBook component, such as
a book, a chapter, or an article.  

An <assembly> should contain an <info> element to store any metadata for that
assembly.  Additionally it must contain at least one <resources> container
that specifies the components that are included in this assembly.  To define 
the hierarchy and sequence of resources to be rendered and displayed in the 
final output, an <assembly> can contain one or more <toc> elements. The 
<assembly> can also contain a <relationships> container that is used to define 
the type and trajectory of relationships between resources.

The following RelaxNG (compact) notation illustrates the model:

db.assembly =
  element assembly {
    db.info?, db.toc*, db.resources+, db.relationships*
  }

An assembly may only contain resources, without relationships or toc, as a way to collect
resources.

4. The <resources> Element

The <resources> element is high-level container that contains one or more 
resource objects that are managed by the <assembly>.  An <assembly> can
contain 1 or more <resources> containers to allow users to organize content
into logical groups based on profiling attributes.

Each <resources> element must contain 1 or more <resource> elements.

db.resources = element resources { db.common.attributes, db.resource+ }

<assembly>
    <resources xml:lang="en-us">
    
    </resources>
    
    <resources xml:lang="jp-jp">
    
    </resources>
</assembly>

5. The <resource> Element

The <resource> element identifies a "managed object" within the assembly. 
Typically, a <resource> will point to a content file that can be identified by
a valid URI.  However a <resource> can also be a 'static' text value that 
behaves similarly to a text entity. 

Every <resource> MUST have a unique ID value within the context of the entire
<assembly> in order to ensure that there can only be one reference to that 
resource (see section 5.1 for more information about resource merging).
Multiple tocentry or resource entry elements, however, may point to the same resource element.

db.resource =
  element resource {
    db.common.attributes,
    attribute fileref { text }?,
    text?
  }

Content-based resources can also be content fragments within a content file,
similar to an URI fragment:  file.xml/#ID.

Additionally, a resource can point to another resource.  This allows users to
create "master" resource that can be referenced in the current assembly, and
indirectly point the underlying resource that the referenced resource identifies.
Profiling attributes may also be used which would be applied when a resource is processed, 
allowing the same fileref to be processed with different conditionals applied.

For example:

<resource id="master.resource" fileref="errormessages.xml"/>
<resource id="class.not.found" resid="{master.resource}/#classnotfound"/>
<resource id="null.pointer" resid="{master.resource}/#nullpointer"/>

The added benefit of indirect references is that users can easily point the 
resource to a different content file, provided that it used the same underlying
fragment ids internally.  It could also be used for creating locale-specific
resources that reference the same resource id.

Text-based resources behave similarly to XML text entities.  A content-based
resource can reference a resource, provided that both the text resource and
the content resource are managed by the same assembly.  

assembly.xml:

...
<resource id="company.name">Acme Tech, Inc.</resource>
<resource id="company.ticker">ACMT</resource>
...

file1.xml:

<para><phrase resid="company.name"/> (<phrase resid="company.ticker"/>) is a 
publicly traded company...</para> 

5.1 Resource Merging

There may be cases where a "master" or "parent" assembly can define a resource
that has already been defined in a "child" assembly using the same ID.  In this 
case, the "parent" assembly's resource with the same ID SHALL override the 
"child" resource.

The following example illustrates:

master-assembly.xml:

<toc>
    <tocentry linkend="my.resource"/>
    <tocentry linkend="child.assembly"/>
</toc>

...
<resource id="my.resource" fileref="section-a.xml"/>
<resource id="child.assembly" fileref="child-assembly.xml"/>
...

child-assembly.xml:

<toc>
    <tocentry linkend="my.resource"/> <!-- parent resource is used -->
</toc>

...
<!-- parent overrides this value -->
<resource id="my.resource" fileref="section-b.xml"/> 
...

In this example, the child assembly contains a resource with the id, 
'my.resource' pointing to a file named 'section-b.xml'.  In that assembly's 
<toc>, the <tocentry> point's to that resource's file reference.  In the 
parent assembly, there is another resource with id, 'my.resource', pointing to
a different file, 'section-a.xml'.  Since the parent assembly references 
the child assembly ('child.assembly') and includes the child assembly in its toc
(<tocentry linkend="child.assembly"/>), any tocentry elements pointing to
'my.resource' in the child assembly's toc will point to 'section-a.xml' rather
than 'section-b.xml' as specified in the child assembly's resource.


5.2 Resource Scoping

By default, all content-based resources are presumed to be local identifiers 
that are intended to be processed with the XML content.  However, there may be 
cases where resources point to external location identifiers that
should not be explicitly processed.  These could be references to a website URL,
or PDF content that are intended to be linked in but require no additional 
processing.  Additionally during the authoring process, there may be references
to resources that haven't yet been development yet, but will be available for 
final publish.


As a result, the <resource> element needs a scope attribute that allows users
to identify resources that are either external or that should not be processed
at that time, this attribute should have the following enumerated values

    - local
    - external
    - no-op
    
"local" should be the default value and should not require users to explicitly
set this value.  

"external" means that the resource should be linked to, but there is no 
additional processing required.

"no-op" means that the resource is currently unavailable and should not be 
processed.


5.3 Pointing to an Unspecified Resource ID

All resources must be defined in the assembly by a unique ID.  If another 
element points to an unidentified/unspecified resource, the processor SHOULD
consider it a RECOVERABLE ERROR at which point the processor should emit a warning either in
the output or in a StdErr stream.

6.  The <relationships> Element

The <relationships> element is a container containing relationships between
resources. Each <relationship> contains one or more associations between 
resources.  Each association can contain one or more resource instances linked 
to a resource id.

Relationships can be used to generate related links between resources, much in 
the same way that blog entries are tagged.  For example Scott Hudson's blog
contains dozens of entries tagged to "DocBook" over the years.  By clicking
on the tag, a user can see all of these entries related to DocBook.


6.1 OPTION 1 - Matrix method  

If you presume that relationships are n-dimensional matrices where each column
vector represents an association type (e.g., a 'tag'), and each row vector
represents links between resources across associations, the example above could
be modeled with the following markup:

<relationships>
    <relationship id="blog.tags">
        <header>
            <label id="blog.entry">Entry</label>
            <label id="blod.tag">Tag(s)</label>
        </header>
        <body>
            <item>
                <association>
                    <instance linkend="blog.entry.1"/> <!-- ref to resource -->
                </assocation>
                <association>
                    <label id="DocBook">DocBook</label>
                    <label id="XML">XML</label>
                </association>
            <item>
            <item>
                <association>
                    <instance linkend="blog.entry.5"/>
                </association>
                <association>
                    <labelref linkend="DocBook"/>
                    <label id="relaxng">RelaxNG</label>
                </association>
            </item>
        </body>
    </relationship>
</relationships>

6.2 OPTION 2 - Definition list method

Another option is to mirror the structure of definition lists, such that:
<relationships>
    <relationship id="blog.docbook">
        <arc id="DocBook">DocBook</arc>
        <instance linkend="blog.entry.5"/>
        <instance linkend="blog.entry.3"/>
    </relationship>
</relationships>

In this case, the term DocBook, is associated with 2 content resources. Any number of relationships
can be defined with this method. The model would be defined as:

db.assembly =

  element assembly {
    db.info?, db.toc*, db.resources+, db.relationships*
  }

db.resource =
  element resource {
    db.common.attributes,
    attribute fileref { text }?,
    text?
  }

db.relationships =
  element relationships {
    db.common.attributes,
    db.relationship+
  }

db.relationship =
  element relationship {
    db.common.attributes,
    db.arc, db.instance+
  }

db.arc =
  element arc{
    db.common.attributes,
    & db.linkend.attribute?,
    text?
  }

db.instance =
  element instance {
    db.common.attributes,
    & db.linkend.attribute
  }

6.3 OPTION 3 - Standards based options

Further options would be to directly include one of the established standards for describing relationships,
such as XML Topic Maps (XTM) or Resource Description Framework (RDF). These elements would reside
in their appropriate namespace.


6.4 Merging Relationships

It is quite possible to have relationships that are defined in multiple 
assemblies which are related by the same id.  For example, a conference has
several topic tracks and wants to create the proceedings organized by the track.  
Each track has is managed in its own assembly file.  Within these assembly 
files, there is a relationship between the presentation/paper and the author of
that paper.  The parent assembly could identify each track assembly as a resource
which subsequently merges the relationships, provided that they all use the same
id on the <relationship> element.

Now let's assume that Scott and Jim each had two papers to present at the 
conference - one shared presentation, and each with an additional individual 
presentation.  The shared presentation was slated for Track 1, Jim's individual
presentation was in Track 2 and Scott's was in Track 3.  Each track assembly
is "unaware" that Jim and Scott have papers in any other track.  When the 
tracks are referenced by the parent assembly and the relationships are merged,
the processor can now render each presentation, and create "related links" to 
other presentation by the same author.

7. The <toc> Element
The <toc> element defines the sequence and hierarchy of content-based resources
that will be rendered in the final output.  It behaves in a similar fashion
to a DITA map and topicrefs.  However, instead of each <tocentry> pointing to
a URI, it points to a resource in the <resources> section of the assembly:

<toc>
    <tocentry linkend="foo"/>
    <tocentry linkend="bar">
        <tocentry linkend="baz"/>
    </tocentry>
</toc>

<resources>
    <resource id="foo" fileref="file1.en.xml"/>
    <resource id="bar" fileref="file2.en.xml"/>
    <resource id="baz" fileref="data.xml/#table1"/>
</resources>


7.1  The "renderas" Attribute.

Because of the wide range of component and container level elements within 
DocBook, it is quite possible that a child assembly could contain a toc with
one or more "section" resources.  In the parent assembly, the child assembly 
could be identified as a chapter, an appendix, or perhaps just a collection
of 'help' topics for a help system.  The value "auto" could also be useful to 
signal the renderer to produce the proper element based on the current context
in this document (section in a chapter, etc.).

7.2 <toc> Merging

For child assemblies referenced as a resource in a parent assembly, the child
assembly can inserted into the parent assembly's toc by inserting a <tocentry>
in the parent toc.  The contents of the child assembly's toc are then inserted
as children of the parent's tocentry:

Child assembly:

<toc>
    <tocentry linkend="child.section.1"/>
    <tocentry linkend="child.section.2"/>
    <tocentry linkend="child.section.3"/>
</toc>

Parent Assembly:

<toc>
    <tocentry linkend="parent.section.1"/>
    <tocentry linkend="parent.section.2"/>
    <tocentry linkend="child.assembly"/>
</toc>

<resources>
    <resource id="parent.section.1"/>
    <resource id="parent.section.2"/>
    <resource id="child.assembly"/>
</resources>

"Collated Assembly" (Parent Assembly):

<toc>
    <tocentry linkend="parent.section.1"/>
    <tocentry linkend="parent.section.2"/>
    <tocentry linkend="child.section.1"/>
    <tocentry linkend="child.section.2"/>
    <tocentry linkend="child.section.3"/>
</toc>