OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

entity-resolution-comment message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [entity-resolution-comment] Some queries on the XML Catalogs spec


At 18:36 2003 04 11 +0100, Richard Tobin wrote:
>(You certainly make it hard to post comments here!  The address in the
>spec is wrong - is has an "s" on the end of comment - and you can't
>subscribe in the way it describes.)

That's unfortunate.  We'll have to see about getting that fixed.

I answer below just for myself--and in the hopes of getting
others on the committee to speak up if they disagree.


>I'm implementing the XML Catalogs spec for RXP, working from the 21
>Feb 2003 draft, and I have a few queries:

I'm very glad to hear this!

>Is the system identifier input to the resolver the system identifier
>as it appears in the document, or the result of absolutizing that?
>Presumably the latter, but it doesn't seem to say that in section 7.1.

It should be the former, though I understand some processors such as
SAX make this difficult.  But I believe it should definitely be the former.  
Otherwise, how could you make an entry that worked, for example, for:

  <!DOCTYPE book SYSTEM "docbook.dtd">

regardless of where the document were found.  Specifically, something like:

  <system systemId="docbook.dtd" uri="file::/doctypes/docbook/docbook.dtd"/>

should cause all documents on a file system with the above doctype decl
to use the DocBook DTD found at /doctypes/docbook/docbook.dtd.

>Step 3 of 7.1.2 does not mention that the rewriteSystem entry with the
>longest systemIdStartString is the only one used.  This is stated in
>6.5.5, but it should be explicit in the description of the algorithm.

Agreed.

>Section 6.3 does not explicitly say whether the hex digits in a
>%-escape inserted during normalization must be in upper or lower case,
>though the use of %HH might be taken to mean upper case.  Since there
>is no mention of these being matched case-insensitively, it is
>important which are used.

Good point.  I don't know what we expected here.  I note the uppercase
form used in the table in 6.4.

>Is there a recommended way to handle external identifiers when parsing
>catalog files themselves?  Obviously they cannot be looked up in the
>catalog.  Is it reasonable to have a copy of catalog.dtd built-in to
>the parser and special-case its system and public identifiers?

Do you mean the external ids such as in the doctype decl in the catalog 
file itself?

I have to admit I haven't thought about this much.  The TR 9401 catalog
wasn't SGML/XML, so there wasn't any such issue.

Personally, I'd think it would be fine to treat the XML catalog as
well-formed XML and then do any error checking you wish--including
validating it against a built-in DTD.

But that doesn't really say anything about what to do with external
identifiers therein.  I guess I'd assume that there is no look up
or mapping of external ids in catalog files.

paul





[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]