[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: FW: Followup from eContracts teleconference: validation
Dear TC members, In our yesterday's telecom there was a short fragment of discussions that was talking about ignoring Namespaces and validation. Dr Hoylen Sue from DSTC, who was present at the meeting, sent me his views on the discussion. He did not want to interrupt the meeting and lead the teleconference off on a tangent discussion - so he sent this followup on this issue in this email. This is included below. Cheers, Zoran 1. It was argued that some software could ignore the namespace prefix and just interpret the name of the elements. For example, if the contract had the following: <legal:section> <legal:h>My title</legal:h> <legal:p>etc.</legal:p> </legal:section> A tool that only understood XHTML could interpret it as: <section> <h>My title</h> <p>etc.</p> </section> and then treat it as XHTML elements, because that is what it understands. I think this is totally wrong! It is an incorrect interpretation of how namespaces work and goes against what the schema says. 2. It was then mentioned that a programmer would implement it this way. However, what a programmer could get away with and what is a correct implementation are two different things. If a programmer writes a program that accepts buggy input, it doesn't make that input correct. The DOM API does have two functions for getting the qualified name of an element: getLocalName() and getNamespaceURI(). A lazy programmer (or one that doesn't care about namespace correctness) would write something like: if (node.getLocalName() == "section") { // process the section } However, what they really should be writing is: if (node.getLocalName() == "section" && node.getNamespaceURI() == "http://example.org/legal") { // process the LegalXML section } The latter will pick up more errors. The former (lazy code) will allow data that is invalid according to the XML Schema. 3. An example was also mentioned that with DocBook, you can add your own extra attributes and they will simply be ignored. For example: <para Lang="en" legal_xml_is_cool="yes">... will work with some existing DocBook tools. Where "Lang" is a DocBook attribute, but "legal_xml_is_cool" is obviously not. That may be the case, but it is side-effect of the tools not fully checking their input rather than something the schema is designed to allow. The above example _will_ fail to validate against the DocBook DTD because the extra attribute is not a part of it. Note: it is possible to define XML Schemas which allow foreign attributes, and even elements, but that needs to be an explicit design decision. Again, this is just programmers being lax about how much input checking that they do. The lazy checking for unexpected attributes is fostered by the design of the XML parsing APIs -- to some degree in DOM, but significantly in SAX. In SAX, the programmer is given an Attributes object as an attribute. So the lazy programmer who wants to get the value of "en" from the Lang attribute would simply write: String language_value = atts.getValue("Lang"); and happily not do any further checking. If they were really vigilent, they would _also_ have to write this checking code: for (int n = 0; n < atts.getLength(); n++) { String local_name = atts.getLocalName(n); String namespace = atts.getURI(n); if ((local_name=="Lang" && namespace=="http://example.org/docbook") || (local_name=="Arch" && namespace=="http://example.org/docbook") || (local_name=="Condition" && namespace=="http://example.org/docbook") || (local_name=="Conformance" && namespace=="http://example.org/docbook") || (local_name=="ID" && namespace=="http://example.org/docbook") || (local_name=="OS" && namespace=="http://example.org/docbook") || ...etc for all the possible attributes of para... ) { // attribute is allowed } else { throw Unexpected_Attribute_Exception(); } I think you can see why programmers choose the lazy approach and just ignore unexpected attributes, hoping that the user would not create such incorrect input. In SAX you can't just look at the elements you are interested in and ignore the rest -- you have to examine every one of them. That is why many applications will allow you to get away with attributes it doesn't understand, but extra unexpected elements will be rejected. To sum up, this interpretation is a consequence of lazy input checking and lax implementations, it is not intended by the defintion of the XML Schema or DTD. -- ______________________________________________ Dr Hoylen Sue h.sue@dstc.edu.au http://www.dstc.edu.au/ DSTC Pty Ltd --- Australian W3C Office +61 7 3365 4310
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]