[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]
Subject: DOCBOOK-APPS: Re: Looking for "swc"
>From: Bernd Kreimeier <firstname.lastname@example.org> >To: email@example.com >Subject: DOCBOOK-APPS: Looking for "swc" >Date: Sun, 24 Feb 2002 22:21:25 -0800 > >I need an SGML/XML-aware version of wc to count words, lines, etc. >directly onm SGML source. I used some db2txt right now, is there a >a direct way to do this (in the way of sgrep)? Ah, a task trivially done in XSLT. If you want to run it on SGML, there are tools for converting SGML to XML that you could use to preprocess the input. If I understand your request properly, you want this: <?xml version='1.0'?> <!DOCTYPE xsl:stylesheet > <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"> <xsl:output method="text"/> </xsl:stylesheet> Then, just pipe its output into wc. The reason this works is that the default template invokes itself on all child element nodes. PCDATA nodes just get output, unmodified. All other kinds of nodes (e.g. attributes, comments, processing instructions, etc.) are simply ignored. You may have noticed that DocBook was designed so that the only text in the source document which shows up in the output is the child of elements. DocBook was designed so that attribute values never (?) need to be output, literally. If you use any characters not in iso-8859-1, you'll have to specify a compatible (with the source) encoding attribute, on xsl:output. I use a similar trick to select which element content I want to spell-check, except the way I do the filtering is to create a "do-nothing" template that matches all the inline elements I want to ignore (e.g. varname, corpname, email, function, etc.) This is another advantage unique to semantically rich markup! The same approach could be used to omit elements whose child PCDATA you don't want to include in your count. Matt Gruenke _________________________________________________________________ Send and receive Hotmail on your mobile device: http://mobile.msn.com
Powered by eList eXpress LLC