[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: Re: [docbook-apps] Strip docbook-5 to content only
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256 Hi Dave, On 23/03/14 12:32, davep wrote: > I'm playing with a grammar checker that isn't as yet XML friendly. > One option is to strip all markup and pass through to the grammar > checker having expanded any xincludes. Interesting -- what checker do you use, if I may ask? > Issues: 1. Plain text output, Ideally block -> newline, inlines > ->whitespace separation. 2. Indexing is a special. Null template > for <db:indexterm/> 3. Ditto (remove markup) for toc > > Can anyone think of any other 'specials' that might need > processing to obtain a simple text file ready for a spell checker? Since I am trying to implement some sort of style/terminology checker here, here are the rules I use to prepare the text before the terminology check: https://www.gitorious.org/style-checker/style-checker/source/999eb9696fed15e75b01eee2febbb28562fc3144:source/xsl-checks/terminology.xslc You can see that I try to hide things like literals and keys from the style checker. The ##@sth## format is because I am using regular expressions and wanted a format that is distinctive but does not contain any regular expression characters. Hth, Stefan. - -- SUSE LINUX Products GmbH, Maxfeldstraße 5, D-90409 Nürnberg Geschäftsführer: Jeff Hawn, Jennifer Guild, Felix Imendörffer HRB 16746 (Amtsgericht Nürnberg) -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iF4EAREIAAYFAlMwADsACgkQ5AP3bIqhlM1h0gD/YZsuB/RNWJEyPYBhkYoBRoN6 q7EnNviWub9HPF1JmLMA/Ao0nDvCror2CfS/GauSA7LCaISXvkGQFVztP4OQ6c6v =brM5 -----END PGP SIGNATURE-----
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]