OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

docbook-apps message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [docbook-apps] Stripping comments


  Here's a quick perl solution that doesn't read everything into
memory and seems to handle some of the edge cases.  Try it out on a
few things to verify that everything is okay before completely
trusting it, though. :)

  Copy the lines between '------------' into a file (say
strip_xml_comments.pl).

(if on Unix do this step first)
chmod 755 strip_xml_comments.pl

   Make a backup copy of any and all files that you'll be using.  (The
script should work fine as is, but it's *MUCH* better to be safe than
sorry. :)

   Now you should be able to run the script on a copy of your input file.

 strip_xml_comments.pl my_xml_input_file.xml

   The script will make a backup copy of its own with '.orig' at the
end of the name. (Please don't just rely on this feature -- make your
own backup.)

  Verify that everything looks okay and integrate it into your
application stream.

  Here's the script

----------------------
#!/usr/bin/perl -w -i.orig

#
# NB: Delete the '.orig' portion if backup copies are not desired
#

#
# Delete XML comments.
#


#
# Go through every file given on the command line
#
$in_comment= 0;
while( <> ) {

 #
 # Match inline comments
 #
 s {
        <!--    # Match the opening delimiter.
        .*?     # Match a minimal number of characters.
        -->     # Match the closing delimiter.
} []gsx;

 #
 # Match multi-line comments
 #
  if( /<!--/ ) {
    $in_comment= 1;
    next;
  }

  #
  # Find the end of a multi-line comment and remove everything to that point.
  # NB: All other in-line comments have already been removed
  #
  if( /-->/ ) {
     s/.*-->//;
    $in_comment= 0;
  }

  #
  # Ignore every line in the comment
  #
  if( $in_comment ) {
     next;
  }

 print;  # Print everything on the current line
}

----------------------

   Note that the code is a simple modification of one of the examples
from the perlre man page (http://perldoc.perl.org/perlre.html).


  Hopefully this will suit your purposes!


kells

>
> ----- Original Message -----
> From: "Paul Moloney" <paul_moloney@hotmail.com>
> To: <docbook-apps@lists.oasis-open.org>
> Sent: Thursday, March 29, 2007 6:45 AM
> Subject: [docbook-apps] Stripping comments
>
>
> >
> > One task I have it to package our source XML files for use by
> > integrators;
> > one thing I'd like to do is first strip the comments from these files as
> > they may contain sensitive information.
> >
> > I was thinking that this could be done by processing each file through
> > Saxon
> > using a stylesheet which strips out comments and outputs the XML again.
> > But
> > rather than risk reinventing the wheel, I was wondering if anyone out
> > there
> > has implemented a DocBook comment stripper in their build process?
> >
> > Thanks,
> >
> > P.
> > --
> > View this message in context:
> > http://www.nabble.com/Stripping-comments-tf3486783.html#a9734912
> > Sent from the docbook apps mailing list archive at Nabble.com.
>


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]