OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

docbook-apps message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]

Subject: DOCBOOK-APPS: Re: Looking for "swc"

>From: Bernd Kreimeier <bk@oddworld.com>
>To: docbook-apps@lists.oasis-open.org
>Subject: DOCBOOK-APPS: Looking for "swc"
>Date: Sun, 24 Feb 2002 22:21:25 -0800
>I need an SGML/XML-aware version of wc to count words, lines, etc.
>directly onm SGML source. I used some db2txt right now, is there a
>a direct way to do this (in the way of sgrep)?

Ah, a task trivially done in XSLT.  If you want to run it on SGML, there are 
tools for converting SGML to XML that you could use to preprocess the input. 
  If I understand your request properly, you want this:

<?xml version='1.0'?>
<!DOCTYPE xsl:stylesheet >
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
    <xsl:output method="text"/>

Then, just pipe its output into wc.

The reason this works is that the default template invokes itself on all 
child element nodes.  PCDATA nodes just get output, unmodified.  All other 
kinds of nodes (e.g. attributes, comments, processing instructions, etc.) 
are simply ignored.  You may have noticed that DocBook was designed so that 
the only text in the source document which shows up in the output is the 
child of elements.  DocBook was designed so that attribute values never (?) 
need to be output, literally.

If you use any characters not in iso-8859-1, you'll have to specify a 
compatible (with the source) encoding attribute, on xsl:output.

I use a similar trick to select which element content I want to spell-check, 
except the way I do the filtering is to create a "do-nothing" template that 
matches all the inline elements I want to ignore (e.g. varname, corpname, 
email, function, etc.)  This is another advantage unique to semantically 
rich markup!

The same approach could be used to omit elements whose child PCDATA you 
don't want to include in your count.

Matt Gruenke

Send and receive Hotmail on your mobile device: http://mobile.msn.com

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]

Powered by eList eXpress LLC