[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: Re: DocBook TC action item reminders
Folks, please find a proposal for modular DocBook attached. Special thanks to Jim Earley for writing the bulk of the first draft, and for the additional collaboration of Larry and Dick. Best regards, --Scott Scott Hudson Senior XML Architect e: scott.hudson@FlatironsSolutions.com O: 303.542.2146 C: 303.332.1883 F: 303.544.0522 http://www.FlatironsSolutions.com Vision. Experience. Engineering Excellence. Bob Stayton wrote: > Hi, > This is a repeat of last month's reminder mail since we did not meet in > February. > > Next meeting is Wednesday March 18. > > Bob Stayton > Sagehill Enterprises > bobs@sagehill.net > > > ----- Original Message ----- > From: "Bob Stayton" <bobs@sagehill.net> > To: "Jirka Kosek" <jirka@kosek.cz>; "Scott Hudson" > <scott.hudson@flatironssolutions.com>; "Larry Rowland" > <larry.rowland@hp.com>; "Norm Walsh" <ndw@nwalsh.com>; "Bob Stayton" > <bobs@sagehill.net> > Sent: Thursday, February 12, 2009 12:13 AM > Subject: DocBook TC action item reminders > > > >> If you are getting this mail, then you have an >> action item in the list below, and this is your friendly >> reminder service before the next meeting. >> If you have already completed your action items, then >> good for you! >> >> Next meeting is **Tuesday** 17 February 2009. >> >> >> Actions: >> >> a. Bob to organize TDG reading after names are fixed. >> >> b. Norm to write up a backwards compatibility policy document. >> >> c. Norm to incorporate group parameter change (RFE 1998852) into the >> schema >> for 5.1. >> >> d. Norm to ask mailing list about 'rep' on methodparam. >> >> e. Norm to update OASIS site for 5.0 spec and schema. >> >> f. Norm to update spec to include public and system identifiers >> for the 5.0 DTD version. >> >> g. Jirka to add schema comparison table to DocBook 5.0 Transition Guide. >> >> h. Norm to add floatstyle attribute to sidebar for 5.1. >> >> i. Norm to write up proposed content model for initializer. >> >> j. Norm to add subtitle to sidebar for 5.1. >> >> k. Norm to determine OASIS requirements for charter updates. >> >> l. Norm to solicit for a third DocBook 5 user again. >> >> m. Norm to work with Mary to officially adopt the new charter at OASIS. >> >> n. Norm to work with Mary to make Publishing Subcommittee >> schema a Committee Working Draft. >> >> o. Norm to work with Keith and Scott to update the OASIS committee >> site to make the Publishing Subcommittee Working Draft >> publicly available. >> >> p. Scott to write up a modular DocBook proposal for the TC >> to discuss. >> >> q. Scott to append suggestions to RFE 1722935 from the >> Publishing Subcommittee regarding additional class values. >> >> r. Larry to write additional documentation for the existing name >> elements describing how they are best used in different locales. >> >> >> Bob Stayton >> Sagehill Enterprises >> bobs@sagehill.net >> >> >> > >
MODULAR DOCBOOK PROPOSAL 1. Overview DocBook has long been the standard for creating technical publications in SGML and XML. The standard has a rich, comprehensive element set capable of handling most structural and semantic markup that can be found in technical documentation and can produce a wide variety of output formats. In recent years, industry trends have begun to emphasize more modular authoring processes. Many factors are driving content creation in this direction: - more distributed authoring: authors are responsible for specific content areas rather than whole manuals. Content could be authored by many different authors, even some in different organizations altogether. - content reuse: This has long been a "holy grail" of information architects: write content once, reuse in many different contexts - change management: isolate the content that has changed. This is a key driver for companies that have localization needs. By modularizing their content, they can drive down costs by targeting only the changed content for translation. In addition to the core business drivers, there are additional downstream opportunities for modularized content: - dynamic content assembly: create "publications" on the fly using an external assembly file that identifies the sequence and hierarchy of modular components rather than creating a single canonical instance. In the 2000's DITA was introduced as a new OASIS XML standard that leveraged a modular design where content is created and stored as individual "topic" files and assembled into publications using "maps". Since then, interest in the standard has grown substantially, particularly because of the modular features. In the same vein, DocBook still retains a strong and large community of users with a significant investment in tools and processes primarily focused on delivering content from DocBook and its many variants. Nonetheless, there is growing interest from the community to have some of the same modular features found in DITA built into DocBook. 2. A Modular DocBook Design To support a more modular architecture within DocBook, we need to account for the numerous structural elements currently defined in the grammar. DocBook's rich design was principally focused on the creation of printed content such as books and whitepapers, which is reflected in elements such as: - set - book - part - chapter - appendix - reference - article These structures supported a diverse set of lower-level hierachical elements that organize content in a logical way: - sect* - section - refentry - biblioentry From a modular content design, any component-level and container-level elements could reasonably be a logical unit of information. Because of this, the basic modular architecture must be more flexible than DITA's, which has only one logical unit of information - the topic (and its specializations). Subsequently, a collection of DocBook components can be much more structurally diverse than DITA topics in a map. As a result, we differentiate from DITA's map semantics with the introduction a new element: <assembly>. 3. The <assembly> Element The <assembly> element is the root-level element that defines the resources, hierarchy, and relationships for a collection of DocBook components. An <assembly> can be the structural equivalent of any DocBook component, such as a book, a chapter, or an article. An <assembly> should contain an <info> element to store any metadata for that assembly. Additionally it must contain at least one <resources> container that specifies the components that are included in this assembly. To define the hierarchy and sequence of resources to be rendered and displayed in the final output, an <assembly> can contain one or more <toc> elements. The <assembly> can also contain a <relationships> container that is used to define the type and trajectory of relationships between resources. The following RelaxNG (compact) notation illustrates the model: db.assembly = element assembly { db.info?, db.toc*, db.resources+, db.relationships* } An assembly may only contain resources, without relationships or toc, as a way to collect resources. 4. The <resources> Element The <resources> element is high-level container that contains one or more resource objects that are managed by the <assembly>. An <assembly> can contain 1 or more <resources> containers to allow users to organize content into logical groups based on profiling attributes. Each <resources> element must contain 1 or more <resource> elements. db.resources = element resources { db.common.attributes, db.resource+ } <assembly> <resources xml:lang="en-us"> </resources> <resources xml:lang="jp-jp"> </resources> </assembly> 5. The <resource> Element The <resource> element identifies a "managed object" within the assembly. Typically, a <resource> will point to a content file that can be identified by a valid URI. However a <resource> can also be a 'static' text value that behaves similarly to a text entity. Every <resource> MUST have a unique ID value within the context of the entire <assembly> in order to ensure that there can only be one reference to that resource (see section 5.1 for more information about resource merging). Multiple tocentry or resource entry elements, however, may point to the same resource element. db.resource = element resource { db.common.attributes, attribute fileref { text }?, text? } Content-based resources can also be content fragments within a content file, similar to an URI fragment: file.xml/#ID. Additionally, a resource can point to another resource. This allows users to create "master" resource that can be referenced in the current assembly, and indirectly point the underlying resource that the referenced resource identifies. Profiling attributes may also be used which would be applied when a resource is processed, allowing the same fileref to be processed with different conditionals applied. For example: <resource id="master.resource" fileref="errormessages.xml"/> <resource id="class.not.found" resid="{master.resource}/#classnotfound"/> <resource id="null.pointer" resid="{master.resource}/#nullpointer"/> The added benefit of indirect references is that users can easily point the resource to a different content file, provided that it used the same underlying fragment ids internally. It could also be used for creating locale-specific resources that reference the same resource id. Text-based resources behave similarly to XML text entities. A content-based resource can reference a resource, provided that both the text resource and the content resource are managed by the same assembly. assembly.xml: ... <resource id="company.name">Acme Tech, Inc.</resource> <resource id="company.ticker">ACMT</resource> ... file1.xml: <para><phrase resid="company.name"/> (<phrase resid="company.ticker"/>) is a publicly traded company...</para> 5.1 Resource Merging There may be cases where a "master" or "parent" assembly can define a resource that has already been defined in a "child" assembly using the same ID. In this case, the "parent" assembly's resource with the same ID SHALL override the "child" resource. The following example illustrates: master-assembly.xml: <toc> <tocentry linkend="my.resource"/> <tocentry linkend="child.assembly"/> </toc> ... <resource id="my.resource" fileref="section-a.xml"/> <resource id="child.assembly" fileref="child-assembly.xml"/> ... child-assembly.xml: <toc> <tocentry linkend="my.resource"/> <!-- parent resource is used --> </toc> ... <!-- parent overrides this value --> <resource id="my.resource" fileref="section-b.xml"/> ... In this example, the child assembly contains a resource with the id, 'my.resource' pointing to a file named 'section-b.xml'. In that assembly's <toc>, the <tocentry> point's to that resource's file reference. In the parent assembly, there is another resource with id, 'my.resource', pointing to a different file, 'section-a.xml'. Since the parent assembly references the child assembly ('child.assembly') and includes the child assembly in its toc (<tocentry linkend="child.assembly"/>), any tocentry elements pointing to 'my.resource' in the child assembly's toc will point to 'section-a.xml' rather than 'section-b.xml' as specified in the child assembly's resource. 5.2 Resource Scoping By default, all content-based resources are presumed to be local identifiers that are intended to be processed with the XML content. However, there may be cases where resources point to external location identifiers that should not be explicitly processed. These could be references to a website URL, or PDF content that are intended to be linked in but require no additional processing. Additionally during the authoring process, there may be references to resources that haven't yet been development yet, but will be available for final publish. As a result, the <resource> element needs a scope attribute that allows users to identify resources that are either external or that should not be processed at that time, this attribute should have the following enumerated values - local - external - no-op "local" should be the default value and should not require users to explicitly set this value. "external" means that the resource should be linked to, but there is no additional processing required. "no-op" means that the resource is currently unavailable and should not be processed. 5.3 Pointing to an Unspecified Resource ID All resources must be defined in the assembly by a unique ID. If another element points to an unidentified/unspecified resource, the processor SHOULD consider it a RECOVERABLE ERROR at which point the processor should emit a warning either in the output or in a StdErr stream. 6. The <relationships> Element The <relationships> element is a container containing relationships between resources. Each <relationship> contains one or more associations between resources. Each association can contain one or more resource instances linked to a resource id. Relationships can be used to generate related links between resources, much in the same way that blog entries are tagged. For example Scott Hudson's blog contains dozens of entries tagged to "DocBook" over the years. By clicking on the tag, a user can see all of these entries related to DocBook. 6.1 OPTION 1 - Matrix method If you presume that relationships are n-dimensional matrices where each column vector represents an association type (e.g., a 'tag'), and each row vector represents links between resources across associations, the example above could be modeled with the following markup: <relationships> <relationship id="blog.tags"> <header> <label id="blog.entry">Entry</label> <label id="blod.tag">Tag(s)</label> </header> <body> <item> <association> <instance linkend="blog.entry.1"/> <!-- ref to resource --> </assocation> <association> <label id="DocBook">DocBook</label> <label id="XML">XML</label> </association> <item> <item> <association> <instance linkend="blog.entry.5"/> </association> <association> <labelref linkend="DocBook"/> <label id="relaxng">RelaxNG</label> </association> </item> </body> </relationship> </relationships> 6.2 OPTION 2 - Definition list method Another option is to mirror the structure of definition lists, such that: <relationships> <relationship id="blog.docbook"> <arc id="DocBook">DocBook</arc> <instance linkend="blog.entry.5"/> <instance linkend="blog.entry.3"/> </relationship> </relationships> In this case, the term DocBook, is associated with 2 content resources. Any number of relationships can be defined with this method. The model would be defined as: db.assembly = element assembly { db.info?, db.toc*, db.resources+, db.relationships* } db.resource = element resource { db.common.attributes, attribute fileref { text }?, text? } db.relationships = element relationships { db.common.attributes, db.relationship+ } db.relationship = element relationship { db.common.attributes, db.arc, db.instance+ } db.arc = element arc{ db.common.attributes, & db.linkend.attribute?, text? } db.instance = element instance { db.common.attributes, & db.linkend.attribute } 6.3 OPTION 3 - Standards based options Further options would be to directly include one of the established standards for describing relationships, such as XML Topic Maps (XTM) or Resource Description Framework (RDF). These elements would reside in their appropriate namespace. 6.4 Merging Relationships It is quite possible to have relationships that are defined in multiple assemblies which are related by the same id. For example, a conference has several topic tracks and wants to create the proceedings organized by the track. Each track has is managed in its own assembly file. Within these assembly files, there is a relationship between the presentation/paper and the author of that paper. The parent assembly could identify each track assembly as a resource which subsequently merges the relationships, provided that they all use the same id on the <relationship> element. Now let's assume that Scott and Jim each had two papers to present at the conference - one shared presentation, and each with an additional individual presentation. The shared presentation was slated for Track 1, Jim's individual presentation was in Track 2 and Scott's was in Track 3. Each track assembly is "unaware" that Jim and Scott have papers in any other track. When the tracks are referenced by the parent assembly and the relationships are merged, the processor can now render each presentation, and create "related links" to other presentation by the same author. 7. The <toc> Element The <toc> element defines the sequence and hierarchy of content-based resources that will be rendered in the final output. It behaves in a similar fashion to a DITA map and topicrefs. However, instead of each <tocentry> pointing to a URI, it points to a resource in the <resources> section of the assembly: <toc> <tocentry linkend="foo"/> <tocentry linkend="bar"> <tocentry linkend="baz"/> </tocentry> </toc> <resources> <resource id="foo" fileref="file1.en.xml"/> <resource id="bar" fileref="file2.en.xml"/> <resource id="baz" fileref="data.xml/#table1"/> </resources> 7.1 The "renderas" Attribute. Because of the wide range of component and container level elements within DocBook, it is quite possible that a child assembly could contain a toc with one or more "section" resources. In the parent assembly, the child assembly could be identified as a chapter, an appendix, or perhaps just a collection of 'help' topics for a help system. The value "auto" could also be useful to signal the renderer to produce the proper element based on the current context in this document (section in a chapter, etc.). 7.2 <toc> Merging For child assemblies referenced as a resource in a parent assembly, the child assembly can inserted into the parent assembly's toc by inserting a <tocentry> in the parent toc. The contents of the child assembly's toc are then inserted as children of the parent's tocentry: Child assembly: <toc> <tocentry linkend="child.section.1"/> <tocentry linkend="child.section.2"/> <tocentry linkend="child.section.3"/> </toc> Parent Assembly: <toc> <tocentry linkend="parent.section.1"/> <tocentry linkend="parent.section.2"/> <tocentry linkend="child.assembly"/> </toc> <resources> <resource id="parent.section.1"/> <resource id="parent.section.2"/> <resource id="child.assembly"/> </resources> "Collated Assembly" (Parent Assembly): <toc> <tocentry linkend="parent.section.1"/> <tocentry linkend="parent.section.2"/> <tocentry linkend="child.section.1"/> <tocentry linkend="child.section.2"/> <tocentry linkend="child.section.3"/> </toc>
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]