OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

docbook-apps message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]

Subject: Re: [docbook-apps] Find unused XML files in a project

Maybe this Python script would be of some use:


My doc library does lots of re-use with xi:includes and images are also
referenced all over the place. To interface with 'publican' (Red Hat
publication tool), though, I need to have all my XML files under one
directory and all my image files under one images/ subdirectory. The
sibin tool consolidates all of my books (under the publican/ directory)
so that they have the tidy structure that 'publican' likes.

Sounds like a similar kind of problem that you are dealing with.

Oh! But one thing to watch out for: the script also converts 'olink'
elements into plain HTTP 'link' elements. You will probably want to
disable that part of the script.


On 31/07/2014 07:20, Nordlund, Eric wrote:
> Hello docbook-apps.
> I have a large set of projects that I am looking to scrub for unused
> graphics and XML files prior to sending off to localization.
> Some of my colleagues have created some very basic bash and batch
> scripts to scan through the folders and find files that aren’t
> referenced in any of the source files so we can delete them, but I worry
> that these scripts don’t catch everything (unused XML files in the base
> directory that reference images will ‘bless’ this images) and we could
> still have extraneous files left over or accidentally delete important
> ones unknowingly.
> Each project has a book.xml file that is the gold master for the
> outputs. If the book.xml file or any of its includes doesn’t reference a
> file in the project, it’s safe to delete. I was hoping that I could use
> xmllint to tell me which files are loaded when I try to validate the
> book.xml, but I haven’t found the magic formula yet.
> I’ve tried the following command to reference all of the loaded files
> during a pass, but it doesn’t seem to list the image files referenced,
> which is mostly the point of this exercise, and I get a lot of noise
> from the module files for the DTD on every include.
> $ xmllint --load-trace book.xml --xinclude --noout &> test1
> Has anyone had a similar problem to solve? Am I going about this the
> right way?
> Thanks, and I’m open to any suggestion. If bash and xmllint don’t work
> here, I am partial to Python as an alternative. Just saying.
> *Eric Nordlund*
> Senior Technical Writer
> Amazon Web Services
> Ph: 206-266-8048 | ericn@amazon.com
> <applewebdata://542D1E87-0A8D-4B5A-A2DC-DE8204C46879/ericn@amazon.com>
> Description: Description: New Picture

Fintan Bolton
Content Services | Red Hat, Inc.
home office. +49-89-14347132
blog: http://docinfusion.blogspot.com

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]