OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

docbook-apps message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]

Subject: Webhelp: My adventures therein

Hi, all.

Here is the promised write-up of my adventures with Webhelp. It is long,
so if you don't care, don't bother reading any further! But I hope some
of you find it helpful. I apologize for the length, but this way, at least
it's one big message for those who don't care, not a bunch of irrelevant
little ones.

-- Mary


  For one of our products, we have a doc set that currently consists of
  three PDFs and two Microsoft Help CHM files.

    Admin guide: 679 pges (13.1 MB)
    User Guide:  792 pages (61.6 MB)
    What's New: 36 pages (2 MB)

    Admin Helpset: 72.5 MB (includes content of
         Admin and User Guides, and What's New)
    User Helpset: 62.9 MB (includes content of User Guide)

  We have been have had issues with the ancient MSHelp compiler
  over the ages, and have been getting increasingly worried about
  its continued viability. It does some strange things on 64-bit
  systems. So we have been looking to replace it.

  These documents (and many more) are all built using a homegrown
  toolchain. The documents are mostly written in DocBook (v. 4.4) and
  converted into various formats using the DocBook stylesheets and
  customizations. (Some are written in other XML that we convert to
  DocBook 4.4 using some combination of Perl and XSL.)

  We use Ant, XSLTproc, XEP, Perl, and various other tools to build our
  docs on both local "development" systems (desktops) and on our
  build system, with nightly and on-demand builds. We have an entire
  set of XSL stylesheets that customize the DocBook stylesheets for
  our "corporate" and "product" styles, and then each project may have
  a project-specifc stylesheet that tweaks the corporate ones. So a
  project's stylesheet may import a corporate stylesheet, which in turn
  imports the DocBook ones. Or a project sheet may go straight to DocBook XSL.

  Due to corporate restrictions, it is generally not easy to upgrade
  things, so we tend to not bother unless we really have to. As a
  result, we had been using DocBook 4.4 and DocBook XSL 1.74.3 for

  While researching options to replace the MSHelp format, we found
  nothing that was both suitable and corporately allowable until we
  noticed that Oxygen (one of the XML editors we have in-house)
  had a "help" format that looked intriguing. After digging into
  it, we discovered that it was based on the webhelp transforms
  in DocBook XSL 1.76.0. Based on some experiments with the stylsheets
  in Oxygen, we bit the bullet to get the latest and greatest
  DocBook XSL release. The format looked like it would do a lot
  of what we wanted, and it was based on the already-established
  toolchain, so we wouldn't have corporate issues. Could
  make it do what we wanted?

  We were eventually able to create webhelp docsets that we could
  use to replace our CHM archives, but it was non-trivial. The
  rest of this describes some of the issues we encountered and how
  we addressed, or didn't address, them. But without the DocBook XSL,
  we would have been SOL. :) So thank you all again for this wonderful

  Dramatis Personae
    DocBook 4.4 DTD
    Docbook XSL 1.78.1
    XSLTProc using libxml2 2.7.3; libxslt 1.1.24
    Xalan (for indexing): Xalan-J 2.7.1
    Perl, Ant, homegrown XSL, and other supporting players

Issue with the "Content-Type" meta element.

  A "meta" element for "Content-Type" is written into each
  of our HTML documents; it has the form of an "open tag":
  <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">,
  but it is stand-alone.

  The search indexer balks at this (and any other unclosed tags), and
  indexing fails. Changing the element to <meta ... /> solves
  the problem. I haven't been able to figure out where
  this comes from in the XSL transforms, so I was not able
  to use XSLT to fix it. (This may be an artifact of some of our
  out-of-date tools.)

  I ended up writing a trivial Perl script that would be
  run on all the generated HTML files before the search-indexing
  step, to change <meta ...> into <meta .../>. Inelegant, but
  effective. This turned out to be really useful later....

Issues with the sidebar TOC.

  The generation of the sidebar TOC for each HTML page bogs down
  the processing on large documents.

  Generating the HTML for our old HTMLHelp format takes less than
  2 minutes on our largest doc. When I ran that doc through the
  Webhelp transform, it OOMed after 6 hours. I noticed that the
  default chunking level was much higher than what we used for
  HTMLHelp and wondered if that might be part of the problem. When
  I changed it, the processing completed successfully in about 2 hours.
  (That would still be a show-stopper for our nightly builds.)

  But that the time it took to process was so strongly related to
  the number of files it was creating made me suspect the sidebar
  TOC was the culprit. (I have to admit that it never occurred to
  me to look for a bug report. I didn't find that until much later!)

  It took some investigation to determine that the TOC generation
  was indeed the problem, but once I narrowed it down, I split the
  HTML generation into two steps. I lifted the template that
  generates the sidebar TOC into a separate stylesheet, and
  pre-generated a single file containing the sidebar TOC
  (the <ul id="filetree"> list) as a preliminary step.

  When generating the chunked HTML, instead of regenerating the
  TOC for each file, we simply read in the pre-generated file.

  Two issues with this:

  1. I needed to use the "generate.consistent.ids"
     parameter to keep the generated IDs in sync between
     generating the sidebar TOC and the standard HTML. I had
     never encountered that parameter before; I was worried I would
     have to solve this myself, so yay again for the stylesheets!
     (These generated IDs caused another issue, though, described later.)

  2. Since the TOC was pre-generated once, we lost the insertion
     of the "webhelp-currentid" attribute for each file. We were
     willing to take that loss if necessary, especially given
     that the ToC doesn't "stick" (bug 1226, which we did not attempt
     to address). But it wasn't.

     Since I already had a Perl script that would be run on all the
     generated HTML (to fix the "meta" element mentioned above),
     it was trivial to add a step to reinstate the "webhelp-currentid"
     attribute at the right place in each file.

  Handling the sidebar TOC this way kept the processing time to
  under 2 minutes with no loss of functionality. I realize that this
  is NOT a general solution and probably not suitable for everyone, but
  given our build environment and the tools we have available, this was
  expedient and fit into our "ecosystem" just fine.

  This doesn't address the issue of embedding this TOC in every file.
  (I hadn't seen the proposed solution noted in bug 1259 before implementing
  my solution, and I'm not sure I'd be allowed to just download it
  (corporate policy, esp. since it includes more JavaScript).

  We are seeing some issues with the "expand/collapse" indicators on the
  sidebar TOC. The "treeview" JavaScript inserts "class" attributes
  with values like "collapsable" and "expandable" to indicate the
  state of the TOc entry (embedded lists). We often see expanded
  lists given the attribute "expandable" rather than "collapsable",
  which means that the "rollup" indicators are incorrect. This seems to
  happen mostly with pointers to sections inside pages, so I suspect
  that this is an interplay between the chunking level and the
  "treeview" JavaScript. (I suspect that it doesn't happen if each link
  goes to a separate page (or at least that no page contains more than
  one level of expandable sections). I tried to run this down
  to the source (the stylesheets only provide the minified JS library),
  but it looks like this library went out of support in 2010 and is
  no longer being maintained. (Because of corporate policies, I can't
  casually download the original JS library.) Since this affects only
  the visual collapse/expand indicators, not the functionality, we are
  willing to live with it for now.

Issues with links to local (within-page) IDs.

  We noted that within-page links did not work. We found the
  messages on the docbook-apps list about this, and tried
  commenting out the salient block in the "main.js" file. This
  fixed the problem for most links within a page (those within "content".
  (We tried using the fix in the later snapshot, but we didn't see any

  We also noted another problem with generated links from the sidebar TOC.
  If you were on a page like, say, "bk01.html" and tried to navigate
  to "bk02ch01s04#id-" (a totally made-up id value, but
  the format is what we got), the correct page and local link would load
  (that is, the new page would be scrolled to the local link), but the
  sidebar disappeared, and the sidebar toggle would not bring it back.

  (Clicking the Next link followed by the Previous link would restore
  it, but the direct navigation from the sidebar TOC always clobbered the

  The problem only occurred with generated IDs. Navigating from the
  sidebar TOC on "bk01.html" to "bk02ch01s04#using-passwords" worked
  fine. Looking at the gross structure of the links in the sidebar
  TOC revealed no differences. The difference had to be in the structure
  of the values of the IDs.

  By default, the "object.id" template with "generate.consistent.ids"
  set makes values like "id-". I played around with these values
  a bit and determined that changing the "dots" to "dashes" solved the
  problem. That is, links with id values like "id-4-2-6-3" worked just
  fine. (The original ids work fine within the content block; it's only
  using them from the sidebar TOC that causes the problem.

  I could find no way to tell the "generate-id" function to alter this
  structure, so I had to override "object.id" and do it myself. (The
  problem appears to be in some piece of JavaScript, but I have not
  attempted to find it. The browser follows the links fine.)

  For completeness, I put "." characters into a couple of our explicitly
  provided IDs and the links to them. They then exhibit the same problem:
  the sidebar does not appear when you traverse to such an ID. (This
  was not a browser-specific problem, either.)

  Note: Unless you have "." in your explicit IDs or have set
  "generate.consistent.ids" for some other reason, this issue wouldn't affect
  anyone who didn't generate the sidebar TOC separately like we did.

Issues with styling and layout.

  The webhelp XSL templates provide some customization mechanisms, but
  we found that we often needed to override pieces that provided no
  handy hooks. And having our CSS file as the first one in the doc
  header meant that it was constantly fighting with the "built-in"
  stylesheets ("positioning.css", the Jquery stylesheets, and the
  CSS elements embedded right into the pages). There were some CSS
  items we could not figure out how to override using just our stylesheet.

  We spent a lot of hours simply trying to figure out where some bit of
  styling was coming from, and then more time trying to figure out how
  to override it. I eventually decided that trying to work around that
  huge block of CSS imports, JavaScript, and embedded CSS in every page
  wasn't worth the effort.

  In the end, I ended up taking apart the "user.head.content" template
  in "webhelp-common.xsl" and refactoring it. I tried to use only the
  customization hooks that were provided, but I just couldn't do it. :)

  I broke "user.head.content" into several smaller templates (one to insert
  CSS imports, one to insert JavaScript, etc) and reimplemented the
  original template simply to call the other templates. That way, I could
  selectively override the parts I wanted/needed to. I could then easily
  import our stylesheet last, which let me move all of the CSS elements that
  were being embedded into each page into the our CSS instead (and change

  This made styling the documents MUCH easier. It also meant that
  I didn't have a big blob of CSS repeated in every HTML page.

  We wanted to change the layout of the items in the header, like the
  nav bar, to be consistent with other collateral we have. MOre overrides.
  I also found it necessary to parameterize some of the other templates
  (like "user.header.content") called from "chunk-element-content".
  I ended up overriding a LOT of stuff. Again, these changes are probably
  not ideal as general approaches (though I think breaking up some of the
  big templates and refactoring them, and maybe adding more parameterized
  customization hooks, are, but most of my fixes were geared toward solving
  my specific problem in my specific environment.

  I also found that we had to alter some of the colors embedded into
  "main.js" to get the effects we wanted. I really didn't want to have
  to change "main.js", but we couldn't find any other way to
  get the changes we wanted. (This was before we discovered the
  local-link issue that required us to change the file anyway.)

  There was no elegant way to override some of the JQuery styling,
  particularly replacing images. We simply had to replace their image
  files (keeping the names) with our images, since the JavaScript
  is responsible for getting the images in. We created a "customization
  template" (a directory with the same structure as the template, but with
  our project-specific variants (images, main.js) in it, and we simply
  slapped this on top of the template from the stylesheets when building
  the docs.

  The one thing that really drove us crazy was the fact that we could not
  figure out how to change the size of that header. We tried a bunch
  of different things, and in the end, we just dealt with what we had.
  I'm sure it's in that JQuery UI Layout stuff, but none of us was familar
  with that package, and we just didn't have the time to try to sort it out.

I would be more than happy to share the customizations we made to the
stylesheets, my Perl script and so on, if anyone is interested in seeing
them. Like I've said, my solutions are probably NOT general-purpose
solutions, but they worked for us and may be helpful to some of you.
I can also send a screen-shot of what our final output looks like.
I don't want to send this out generally, since I suspect most readers
of this list are not interested.

                                     -- 30 --

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]