OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

entity-resolution message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]


Subject: Re: [entity-resolution] [David Brownell <david-b@pacbell.net>][entity-resolution-comment] Fw: Comments on OASIS XML Catalogs (2001-08-06)


At 16:40 2002 10 25 -0500, Norman Walsh wrote:

>Before we declare victory, I think we should take a look at these
>comments. Almost a year old, I found them in my comments-archive.
>Don't know how we all missed them...
>
>
>From: "David Brownell" <david-b@pacbell.net>
>To: <entity-resolution-comment@lists.oasis-open.org>
>Sent: Friday, November 23, 2001 5:55 PM
>Subject: Comments on OASIS XML Catalogs (2001-08-06)
>
>
>> Hi,
>> 
>> I've recently spent a bit of time implementing this spec, and thought
>> I'd share a few comments.  In no particular order:
>> 
>> - While I'm very glad to finally see a "backed" XML catalog syntax
>>   (how many years overdue? :),  I wish the processing model were
>>   much more straightforward.  IMO the complexity here is a clear
>>   obstacle to its broader adoption: makes it harder to understand
>>   and explain, as well as making it more error-prone in use.
>> 
>> - I really don't see any need to have both the "uri" and "systemId"
>>   sets of elements. 

The use of systemId is expressly to model production [75], ExternalID, 
of the XML spec.  This seems like good architecture to me.

XML has no concept of a URI.  It only has a concept of ExternalID
with a SystemLiteral.

> They're functional duplicates, except for the
>>   fact that "system" elements "should" not have fragment IDs. 

There's nothing "should-y" about it:

  "It is an error for a fragment identifier (beginning with a # character)
  to be part of a system identifier."


> It's
>>   confusing and error prone to duplicate data like that 
>... and this

On the contrary, I think it's good architecture to model what
the catalog does from what the XML spec is doing.

I'd also point out that it may make sense to want the catalog
to point to a different resource in the case that a given URI
is the SystemLiteral of an ExternalID versus if it is in some
other URI.  For example, some XML processors may allow for the
use of pre-compiled external subsets in certain cases, so the
DTDs systemid might get mapped to such a file, but the same URI
used elsewhere should point to the actual DTD.

>>   spec doesn't even suggest a motivation for such duplication.

That may be true, but I'm not sure a spec has to motivate everything,
it just has to do the right thing (which is what we did).  Besides,
given how many times we've tried to explain this to him in the past,
I'm not sure it's worth trying again.  But I wouldn't be opposed to
adding something to the spec if others think it worthwhile.

>>   Why should I need to mention an entity in two places, just because
>>   I happen to mention it both in a DTD and elsewhere?  The natural
>>   expectation is that only one mapping entry should be needed.
>> 
>>   One of them should be deleted.  My preference:  keep the "uri"
>>   set, since that one has a name that clearly matches its intended role.
>>   (Then replace current refs to the "systemId" elements, etc, and
>>   simplify section 7.2 as described later in this note.)  It'd be easy
>>   enough for implementations to just maintain one set of internal tables,
>>   supporting "old" elements for backward-compat (if desired).
>> 
>>   Yes, I realize an implicit goal was continuity with SOCAT, but
>>   this is a case where minor surgery to terminology seems likely
>>   to be a long term win ... and I don't recall SOCAT having "uri"
>>   mappings anyway.  If SOCAT-friendliness were a primary goal,
>>   it'd be rather important to have mentioned one that up front.

Well, I thought we did.

>> 
>> - In 4.1.1, it's messy to support the "prefer" attribute on
>>   groups ... catalogs need to maintain multiple distinct tables
>>   of public ID mappings and delegations, and figure out when
>>   to use each set.  Is that complexity really necessary?  It doesn't
>>   seem likely to be very beneficial, and I get concerned about
>>   how I'll know if I'm interpreting the spec correctly.

I would like to discuss this some.  I was always ambivalent about
supporting xml:base, and I think that was one of the big reasons
for grouping which I also wonder about.

>> 
>> - The text in 4.1 (and 4.1.1) is needlessly convoluted.  It should
>>   be simplified in at least two ways.  First, except for a summary
>>   of the intent, all details should be incorporated into 7.1 ... since
>>   it took far too many re-readings for me to come up with an
>>   interpretation that wasn't self-contradictory.  Second, that
>>   summary of intent should become clear!  I think something
>>   along these lines is the true result of the spec (with a ref to
>>   section 7.1 for detailed semantics):
>> 
>>     * In "prefer public identifier" mode, all applicable catalog
>>       entries are used.
>> 
>>     * In "prefer system identifer" mode, catalog entries for
>>       entity public identifiers are ignored except in the case of
>>       system identifiers (URIs) using the "urn:publicid:"; scheme,
>>       where public id mapping entries will always be used.
>> 
>>     * Applications specify a default mode, and catalog
>>       entries for public identifier mappings may override
>>       that default by using "prefer" attributes on catalog and
>>       group elements.
>> 
>>   In conjunction with updates to 7.1, most of section 4.1.1
>>   can/should vanish -- and maybe even the section heading.
>>   That's without loss of functionality, and with improvement
>>   in clarity.
>> 
>> - In 7.1.1 the inputs are incomplete:  there's also the application's
>>   preferred resolution mode, as confusingly explained in 4.1 ...
>>   that needs updating.  The text in 7.1.2 should explain how
>>   that input interacts with the "prefer" attributes that may be
>>   placed on catalog/group elements.
>> 
>> - For that matter, this is catalogs for *XML* ... so the lines
>>   in 7.1.2 reading "if a system ID is provided..." are dubious.

Notations.

>> 
>>   Naturally one was provided!  The _only_ way one won't
>>   be in use is if "urn:publicid: mapping in 7.1.1 morphed
>>   that system ID into a public ID.  This relates to the rather
>>   confusing explaination of the two "prefer" modes; I'd
>>   expect they'd all be clarified together.
>> 

I'm happy to consider improvements in the writing of the spec.
Of course, one person's clarification is another person's
"confusing explaination" [sic].

>> - I think 7.2 would be a lot clearer if it could just be
>>   explained as:  "just as defined in 7.1, with the input
>>   mode 'prefer=system', no public ID, and sysID=uri".
>>   Short, sweet, clear, uncomplicated ... :)
>> 
>> - What should be done for <uri>, <rewriteURI>,
>>   or <delegateURI> elements where the matched
>>   system ID uses the "urn:publicid:"; scheme?  According
>>   to 7.1 and 7.2 any input system ID using such a scheme
>>   will be turned into a public ID based match.  For now,
>>   I'm issuing a warning and ignoring the entry, since such
>>   elements could never match anything.  (Ditto for the
>>   duplicative <systemId*> elements ...)

I don't know the answer here.  I was always slightly confused
by the "urn as system id turns into a public id" business.

>> 
>> As for the <?oasis-xml-catalog ...?> PI, I think it'd be
>> good if _you_ defined a feature flag URI that parsers can
>> use to control this mode.  That could apply to multiple
>> parsers (SAX2, JAXP/DOM, JAXP/SAX2, Perl, etc)
>> It'd be most appropriate for you to define the URI (and
>> control its evolution), and the "define a URI" approach
>> (now generally adopted :) was intended to support third
>> parties doing just that.
>> 

I don't know what he's talking about.  I gather this is
something to do with parsers of which I am unaware.

>> May I suggest using this URI for the feature flag:
>> 
>>   http://www.oasis-open.org/committees/entity/use-catalog
>> 
>> That's it for the moment.  I'll send a note later including
>> a URL for the implementation.  (Free Software, in Java.)
>> 
>> Oh -- not quite all.  What's the story on conformance
>> testing for these catalogs -- are there any tests available?
>> 

There should be.  Of course, the spec needn't refer to them,
but we should make some available.  Or I suppose we could
let OASIS start an XML Catalog Conformance TC.

paul



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]


Powered by eList eXpress LLC