legalxml-econtracts message

Subject: FW: Followup from eContracts teleconference: validation
From: Zoran Milosevic <zoran@dstc.edu.au>
To: Legalxml-Econtracts <legalxml-econtracts@lists.oasis-open.org>
Date: Thu, 19 Aug 2004 09:24:32 +1000
Dear TC members,


In our yesterday's telecom there was a short fragment of discussions that
was talking about ignoring Namespaces and validation.  Dr Hoylen Sue from
DSTC, who was present at the meeting, sent me his views on the discussion.
He did not want to interrupt the meeting and lead the teleconference off on
a tangent discussion - so he sent this followup on this issue in this email.
This is included below.
Cheers,
Zoran




1. It was argued that some software could ignore the namespace
prefix and just interpret the name of the elements.  For
example, if the contract had the following:
  <legal:section>
    <legal:h>My title</legal:h>
    <legal:p>etc.</legal:p>
  </legal:section>
A tool that only understood XHTML could interpret it as:
  <section>
    <h>My title</h>
    <p>etc.</p>
  </section>
and then treat it as XHTML elements, because that is what it
understands.

I think this is totally wrong!  It is an incorrect
interpretation of how namespaces work and goes against what
the schema says.


2. It was then mentioned that a programmer would implement
it this way.  However, what a programmer could get away with
and what is a correct implementation are two different
things.  If a programmer writes a program that accepts buggy
input, it doesn't make that input correct.

The DOM API does have two functions for getting the
qualified name of an element: getLocalName() and
getNamespaceURI().  A lazy programmer (or one that doesn't
care about namespace correctness) would write something like:

  if (node.getLocalName() == "section") {
    // process the section
  }

However, what they really should be writing is:

  if (node.getLocalName() == "section" &&
      node.getNamespaceURI() == "http://example.org/legal";) {
    // process the LegalXML section
  }

The latter will pick up more errors.  The former (lazy code)
will allow data that is invalid according to the XML Schema.


3. An example was also mentioned that with DocBook, you can add
your own extra attributes and they will simply be ignored.
For example:
  <para Lang="en" legal_xml_is_cool="yes">...
will work with some existing DocBook tools.  Where "Lang" is
a DocBook attribute, but "legal_xml_is_cool" is obviously not.

That may be the case, but it is side-effect of the tools not
fully checking their input rather than something the schema
is designed to allow.  The above example _will_ fail to
validate against the DocBook DTD because the extra attribute
is not a part of it.  Note: it is possible to define XML
Schemas which allow foreign attributes, and even elements,
but that needs to be an explicit design decision.

Again, this is just programmers being lax about how much
input checking that they do.  The lazy checking for
unexpected attributes is fostered by the design of the XML
parsing APIs -- to some degree in DOM, but significantly in
SAX.

In SAX, the programmer is given an Attributes object as an
attribute.  So the lazy programmer who wants to get the
value of "en" from the Lang attribute would simply write:
  String language_value = atts.getValue("Lang");
and happily not do any further checking.

If they were really vigilent, they would _also_ have to
write this checking code:
  for (int n = 0; n < atts.getLength(); n++) {
    String local_name = atts.getLocalName(n);
    String namespace = atts.getURI(n);
    if ((local_name=="Lang" &&
         namespace=="http://example.org/docbook";) ||
        (local_name=="Arch" &&
         namespace=="http://example.org/docbook";) ||
        (local_name=="Condition" &&
         namespace=="http://example.org/docbook";) ||
        (local_name=="Conformance" &&
         namespace=="http://example.org/docbook";) ||
        (local_name=="ID" &&
         namespace=="http://example.org/docbook";) ||
        (local_name=="OS" &&
         namespace=="http://example.org/docbook";) ||
        ...etc for all the possible attributes of para...
       ) {
       // attribute is allowed
    } else {
      throw Unexpected_Attribute_Exception();
    }
I think you can see why programmers choose the lazy approach
and just ignore unexpected attributes, hoping that the user
would not create such incorrect input.

In SAX you can't just look at the elements you are
interested in and ignore the rest -- you have to examine
every one of them.  That is why many applications will allow
you to get away with attributes it doesn't understand, but
extra unexpected elements will be rejected.

To sum up, this interpretation is a consequence of lazy
input checking and lax implementations, it is not intended
by the defintion of the XML Schema or DTD.



-- 
______________________________________________ Dr Hoylen Sue
h.sue@dstc.edu.au                    http://www.dstc.edu.au/
DSTC Pty Ltd --- Australian W3C Office       +61 7 3365 4310