[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: Re: [entity-resolution-comment] Public Comment
At 20:07 2003-10-23 +0100, Rob Lugt wrote: > I don't know about a recommended approach, but there are a number of things > you could try:- [snip] > > 2) Consider using relative URIs in your catalog. If I remember rightly, > these should resolve relative to the catalog file in which they appear. > e.g. > <uri name="http://www.oasis-open.org/committees/docbook/" > uri="docbook/"/> > The folk at oasis are particularly good about including the Formal Public Identifier in each file in a stereotyped comment. I wrote a perl read the files in a directory, extracted FPIs, and built a catalog file. I have not had occasion to use it lately, but here it is for your information. You may use, copy, or distribute it under the terms of the GNU Public Licence (GPL). If it breaks, you get to keep the pieces. HTH Terry. Available for contract programming. === script ===> #!/perl/bin/perl # Copyright (c) 2002 Terrence Enger # Across a directory, extract catalog entries from "typical usage" # comments in standard files. This may be useful in creation of a # catalog file. # # Typical usage (all one command): # perl -w # extract_pubids.pl # "C:\Program Files\sgmltools\ISOEnts" # > "C:\Program Files\sgmltools\ISOEnts\autogen.soc" # Change history # -------------- # # 2002-11-14, T. Enger # New program # TODO: # ---- # # - $now_string lacks day of the month # - Think about not reporting subdirectories # - Build in the name of the output file, and do not try to process it # as input. # - Examine standard files in other distributions, and think about # trying to do something useful with them. use English; use POSIX qw(strftime); # Verify argument given, massage for later, and open the directory if ($#ARGV != 0) { die qq[Usage:\n] . qq[ Args are\n] . qq[ (1) directory name; must be quoted if contains space\n]; } my $dirname = $ARGV[0]; if (substr($dirname, -1) ne q[\\]) { $dirname .= q[\\]; }; opendir DIR,$dirname || die qq[cannot open directory $dirname]; # Print introductory lines my $now_string = strftime "%a %b %e %H:%M:%S %Y", localtime; print qq[--\n] . qq[ This file is program-generated. Do not bother trying to\n] . qq[ edit it by hand\n] . qq[ \n] . qq[ Script executed $now_string\n] . qq[ over directory $dirname\n] . qq[--\n\n]; # per file ... while ((my $filename=readdir DIR)) { my $qualified = $dirname . $filename; if (-f $qualified) { open IN, $qualified || die qq[cannot open $qualified]; read IN, $in, 1024; my $pattern = q[Typical invocation:.*\n] . q[.*\sPUBLIC\s*\n] . q[\s*("[^\"]*")]; if ($in =~ m/$pattern/) { print qq[-- from file $filename --\n] . qq[PUBLIC $1\n] . qq[ $filename\n] . qq[\n]; } else { print qq[-- *** failed to find public identifier ] . qq[in $filename *** --\n]; } } else { print qq[-- skipping $filename, not plain file --\n]; } } === example output ===> -- This file is program-generated. Do not bother trying to edit it by hand Script executed Thu Nov 12:21:03 2002 over directory C:\Program Files\sgmltools\ISOEnts\ -- -- skipping ., not plain file -- -- skipping .., not plain file -- -- from file ISOcyr2 -- PUBLIC "ISO 8879:1986//ENTITIES Non-Russian Cyrillic//EN" ISOcyr2 -- from file ISOamsb -- PUBLIC "ISO 8879:1986//ENTITIES Added Math Symbols: Binary Operators//EN" ISOamsb -- from file ISOamsc -- PUBLIC "ISO 8879:1986//ENTITIES Added Math Symbols: Delimiters//EN" ISOamsc -- from file ISOamsn -- PUBLIC "ISO 8879:1986//ENTITIES Added Math Symbols: Negated Relations//EN" ISOamsn -- from file ISOamso -- PUBLIC "ISO 8879:1986//ENTITIES Added Math Symbols: Ordinary//EN" ISOamso -- from file ISOamsr -- PUBLIC "ISO 8879:1986//ENTITIES Added Math Symbols: Relations//EN" ISOamsr -- from file ISObox -- PUBLIC "ISO 8879:1986//ENTITIES Box and Line Drawing//EN" ISObox -- from file ISOcyr1 -- PUBLIC "ISO 8879:1986//ENTITIES Russian Cyrillic//EN" ISOcyr1 -- from file ISOamsa -- PUBLIC "ISO 8879:1986//ENTITIES Added Math Symbols: Arrow Relations//EN" ISOamsa -- from file ISOdia -- PUBLIC "ISO 8879:1986//ENTITIES Diacritical Marks//EN" ISOdia -- from file ISOgrk1 -- PUBLIC "ISO 8879:1986//ENTITIES Greek Letters//EN" ISOgrk1 -- from file ISOgrk2 -- PUBLIC "ISO 8879:1986//ENTITIES Monotoniko Greek//EN" ISOgrk2 -- from file ISOgrk3 -- PUBLIC "ISO 8879:1986//ENTITIES Greek Symbols//EN" ISOgrk3 -- from file ISOgrk4 -- PUBLIC "ISO 8879:1986//ENTITIES Alternative Greek Symbols//EN" ISOgrk4 -- from file ISOlat1 -- PUBLIC "ISO 8879:1986//ENTITIES Added Latin 1//EN" ISOlat1 -- from file ISOlat2 -- PUBLIC "ISO 8879:1986//ENTITIES Added Latin 2//EN" ISOlat2 -- from file ISOnum -- PUBLIC "ISO 8879:1986//ENTITIES Numeric and Special Graphic//EN" ISOnum -- from file ISOpub -- PUBLIC "ISO 8879:1986//ENTITIES Publishing//EN" ISOpub -- from file ISOtech -- PUBLIC "ISO 8879:1986//ENTITIES General Technical//EN" ISOtech -- *** failed to find public identifier in autogen.soc *** --
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]