OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

entity-resolution-comment message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [entity-resolution-comment] Public Comment


At 20:07 2003-10-23 +0100, Rob Lugt wrote:
> I don't know about a recommended approach, but there are a number of things
> you could try:-
[snip]
>
> 2)  Consider using relative URIs in your catalog.  If I remember rightly,
> these should resolve relative to the catalog file in which they appear.
> e.g.
> <uri name="http://www.oasis-open.org/committees/docbook/";
>   uri="docbook/"/>
>

The folk at oasis are particularly good about including the
Formal Public Identifier in each file in a stereotyped
comment.  I wrote a perl read the files in a directory,
extracted FPIs, and built a catalog file.  I have not had
occasion to use it lately, but here it is for your
information.  You may use, copy, or distribute it under the
terms of the GNU Public Licence (GPL).  If it breaks, you
get to keep the pieces.

HTH

Terry.
Available for contract programming.


=== script ===>
#!/perl/bin/perl

# Copyright (c) 2002 Terrence Enger

# Across a directory, extract catalog entries from "typical usage"
# comments in standard files.  This may be useful in creation of a
# catalog file.
#
# Typical usage (all one command):
#   perl -w
#        extract_pubids.pl
#        "C:\Program Files\sgmltools\ISOEnts"
#      > "C:\Program Files\sgmltools\ISOEnts\autogen.soc"

# Change history
# --------------
#
# 2002-11-14, T. Enger
#   New program

# TODO:
# ----
#
# - $now_string lacks day of the month
# - Think about not reporting subdirectories
# - Build in the name of the output file, and do not try to process it
#   as input.
# - Examine standard files in other distributions, and think about
#   trying to do something useful with them.

use English;
use POSIX qw(strftime);

# Verify argument given, massage for later, and open the directory
if ($#ARGV != 0) {
    die qq[Usage:\n]
      . qq[  Args are\n]
      . qq[    (1) directory name; must be quoted if contains space\n];
}

my $dirname  = $ARGV[0];
if (substr($dirname, -1) ne q[\\]) { $dirname .= q[\\]; };
opendir DIR,$dirname || die qq[cannot open directory $dirname];

# Print introductory lines
my $now_string = strftime "%a %b %e %H:%M:%S %Y", localtime;
print qq[--\n]
    . qq[   This file is program-generated.  Do not bother trying to\n]
    . qq[   edit it by hand\n]
    . qq[   \n]
    . qq[   Script executed $now_string\n]
    . qq[   over directory $dirname\n]
    . qq[--\n\n];

# per file ...
while ((my $filename=readdir DIR)) {
    my $qualified = $dirname . $filename;
    if (-f $qualified) {
	open IN, $qualified || die qq[cannot open $qualified];
	read IN, $in, 1024;
        my $pattern =  q[Typical invocation:.*\n]
                    .  q[.*\sPUBLIC\s*\n]
                    .  q[\s*("[^\"]*")];
	if ($in =~ m/$pattern/) {
	    print qq[-- from file $filename --\n]
                . qq[PUBLIC $1\n]
                . qq[       $filename\n]
                . qq[\n];
	} else {
            print qq[-- *** failed to find public identifier ]
                . qq[in $filename *** --\n];
	}
    } else {
	print qq[-- skipping $filename, not plain file --\n];
    }
}

=== example output ===>
--
   This file is program-generated.  Do not bother trying to
   edit it by hand
   
   Script executed Thu Nov  12:21:03 2002
   over directory C:\Program Files\sgmltools\ISOEnts\
--

-- skipping ., not plain file --
-- skipping .., not plain file --
-- from file ISOcyr2 --
PUBLIC "ISO 8879:1986//ENTITIES Non-Russian Cyrillic//EN"
       ISOcyr2

-- from file ISOamsb --
PUBLIC "ISO 8879:1986//ENTITIES Added Math Symbols: Binary Operators//EN"
       ISOamsb

-- from file ISOamsc --
PUBLIC "ISO 8879:1986//ENTITIES Added Math Symbols: Delimiters//EN"
       ISOamsc

-- from file ISOamsn --
PUBLIC "ISO 8879:1986//ENTITIES
        Added Math Symbols: Negated Relations//EN"
       ISOamsn

-- from file ISOamso --
PUBLIC "ISO 8879:1986//ENTITIES Added Math Symbols: Ordinary//EN"
       ISOamso

-- from file ISOamsr --
PUBLIC "ISO 8879:1986//ENTITIES Added Math Symbols: Relations//EN"
       ISOamsr

-- from file ISObox --
PUBLIC "ISO 8879:1986//ENTITIES Box and Line Drawing//EN"
       ISObox

-- from file ISOcyr1 --
PUBLIC "ISO 8879:1986//ENTITIES Russian Cyrillic//EN"
       ISOcyr1

-- from file ISOamsa --
PUBLIC "ISO 8879:1986//ENTITIES Added Math Symbols: Arrow Relations//EN"
       ISOamsa

-- from file ISOdia --
PUBLIC "ISO 8879:1986//ENTITIES Diacritical Marks//EN"
       ISOdia

-- from file ISOgrk1 --
PUBLIC "ISO 8879:1986//ENTITIES Greek Letters//EN"
       ISOgrk1

-- from file ISOgrk2 --
PUBLIC "ISO 8879:1986//ENTITIES Monotoniko Greek//EN"
       ISOgrk2

-- from file ISOgrk3 --
PUBLIC "ISO 8879:1986//ENTITIES Greek Symbols//EN"
       ISOgrk3

-- from file ISOgrk4 --
PUBLIC "ISO 8879:1986//ENTITIES Alternative Greek Symbols//EN"
       ISOgrk4

-- from file ISOlat1 --
PUBLIC "ISO 8879:1986//ENTITIES Added Latin 1//EN"
       ISOlat1

-- from file ISOlat2 --
PUBLIC "ISO 8879:1986//ENTITIES Added Latin 2//EN"
       ISOlat2

-- from file ISOnum --
PUBLIC "ISO 8879:1986//ENTITIES Numeric and Special Graphic//EN"
       ISOnum

-- from file ISOpub --
PUBLIC "ISO 8879:1986//ENTITIES Publishing//EN"
       ISOpub

-- from file ISOtech --
PUBLIC "ISO 8879:1986//ENTITIES General Technical//EN"
       ISOtech

-- *** failed to find public identifier in autogen.soc *** --




[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]