RE: [legalxml-econtracts] Official XML Records

legalxml-econtracts message

Subject: RE: [legalxml-econtracts] Official XML Records

From: "John McClure" <jmcclure@hypergrove.com>

To: "Legalxml-Econtracts" <legalxml-econtracts@lists.oasis-open.org>

Date: Sun, 20 Apr 2003 23:26:34 -0700

Title: Re: [legalxml-econtracts] The CSS issue (was .. Req #106)

A lengthy reply follows, discussing (1) what is "semantic markup" and what is not (2) conformance requirements and validation (3) a law firm's own, private, markup of a legal document and (4) looking ahead to an xml:names attribute, to XForms, and to the impact on other OASIS dialects and on LegalXML work products.

=========================================

Rolly,

I didn't define my term "semantic markup" -- it's all the elements (the vocabulary) being defined by LegalXML TCs, including articles, sections, clauses, document titles, captions, party names and addresses, signature blocks, and so on. Non-semantic markup is that required for presentation -- in the case of SVG, that includes drawing surfaces, fonts, text strings, and scalable-vector-graphic and bit-mapped images. In the case of XSL-FO, non-semantic markup includes page sequences, non/repeating page areas, tables, lists, and text/image blocks. In the case of XHTML, non-semantic markup includes the usual suspects - paragraphs, headings, block and inline objects, tables, and lists.

Your point is well-taken, though, that there is a set of semantic markup that should not ever be considered required for a conforming LegalXML document. Personally, I feel that NO semantic markup should be "required" in order for an XML datastream to be considered a conforming LegalXML document -- all that would be required is merely that it be encoded using any of the XML dialects that we designate as being a "presentation dialect", eg XHTML, SVG, or XSL-FO -- and that its encoding abide by the constraints that we would impose on the use of that dialect for legal documents.

For instance, I would want to insist that legal documents encoded in XHTML would not be allowed to contain (1) <object>; (2) <script>; or (3) certain types of <link>; elements. Similar constraints which foreclose modification of the document's content would be designated for SVG and XSL-FO streams. And, of course, conforming XML documents, regardless of its presentation dialect, would not be allowed to contain an <?xml-stylesheet> programming instruction whose type is anything BUT 'text/css'. Finally, I would be sure to ban all elements and attributes which are from a namespace other than LegalXML's, or are in any way not defined by a W3C specification -- precluding, for instance, Microsoft's data-binding attributes on <input> elements.

The LegalXML namespace would therefore define a single, global attribute, using XML Schema -- a "names" attribute -- and publish a short specification of Do's and Don'ts for what constitutes a conforming presentation document. Sure, an XSL-T stylesheet can be instantly written and distributed that scans for taboo elements, attributes, and programming instructions. The only job after that is to define what the contents of the "names" attribute may contain -- still alot of work! As far as I know, LegalXML would not define any XML elements, just the single global attribute, and a vocabulary, that is, an ontology that can be extended by one's own law firm -- thereby preserving computability of the vocabulary terms across software products (via RDF inheritance).

The validation of this document would therefore proceed normally -- against the DTD or XML Schema defined for the XHTML, SVG, or XSL-FO dialects. The semantic validation would or can occur separately, as it should, from the main business at hand: exchange of the presentation document that IS the official record. A document that is semantically invalid is not legally invalid, if it follows the other rules. Consider this approach to validation as the "next step" in the evolution of XML document validation: from the distinction between a "well-formed" XML document and a "valid" XML document, the process now addresses whether a document is "physically valid" and whether it is "semantically valid". Semantic validation occurs using the ontology we'd have published.

This is an easily marketable, non-intrusive approach to standards-making. It is one that does not require anyone to buy special software to create a document which may then be deemed a "legal" XML document, opening the door wide to many technical implementations of varying complexity. It is intuitively graspable by technologists, attorneys, and judges, and it frees the legal brethren in LegalXML from having to understand all the details of a profession that is not their own - software engineering. Finally, this approach leads straight towards establishing a javascript-like language that is understandable to power-users in the legal profession; it simply leverages the content of the "names" attribute.

The key is a global "names" attribute that contains the name(s) one has assigned to the blocks, strings, and images in the presentation document. To prevent one's "work product" from being shared with another party would then be trivial -- just store one's assigned names in an attribute with a different namespace than Legalxml's (for instance, the firm's own namespace). When the document is "prepared" for transmission as a "legal document", then that attribute would be naturally stripped out by an XSL-T stylesheet, for example. The tool used to annotate the document would simply need to know what namespace prefix to use for the attribute holding the name being assigned to the block, string, or image -- is it a lgl:names, or is it myFirm:names? The vocabulary relevant to the namespace-prefixed names attribute is pointed at by the URI of the namespace, defined using the standard "xmlns" attribute....

This attribute COULD be defined by the W3C as an XML attribute, containing "colonized" names, eg. <span xml:names='lgl:Contract.ReferenceDate.date'>, or it could be defined by LegalXML itself. Personally, I'd rather see xml:names, but I think that happy result can come about only as a result of a clear decision by LegalXML that it is necessary for the purposes of accommodating anticipated legal requirements that eventually would be imposed by courts venturing into this arena.

Thanks,

John McClure
Hypergrove Engineering

PS Incidentally, this technique equally applies to documents encoded using XForms, just as valid a presentation dialect as XHTML in my opinion, it's just that it's not yet a W3C Technical Recommendation. Once it is, I would heartily support adding XForms to the list of permissable dialects for "official records". Also, note that this technique does have the effect of precluding DocBook, UBL, Open Office, and other OASIS dialects as permissable for official, legal records. They can, however, just as easily define their own names attribute for presentation elements, thereby achieving transformability to and from their dialects (assuming that they define no attributes of consequence however) - this would accommodate the wealth of tools created for those dialects.

Thanks,

<div nttp:names='Posting.Author.FullName.en'>John McClure</div>

<div nttp:names='Posting.Author.Company.Name.en'>Hypergrove Engineering</div>

<div nttp:names='Posting.Author.Company.anyURI'>http://www.hypergrove.com</div>

-----Original Message-----
From: Chambers, Rolly [mailto:rlchambers@smithcurrie.com]
Sent: Sunday, April 20, 2003 5:34 PM
To: Legalxml-Econtracts
Subject: RE: [legalxml-econtracts] Official XML Records

Whether a lawyer might want to share markup of the semantic content of a court document would depend on what that content concerned. If it involved "work product" (i.e. the mental impressions, theories, or analysis of the lawyer), the lawyer will not want to share it, is entitled to protect it from the opposing side, and might be forced to disclose it only in fairly narrow circumstances.

Items such as names, addresses, document titles, etc. are probably less of an issue.

There are some litigators who would not want to share any markup, even of case or statutory citations, in order to gain a tactical advantage. For instance, I might not want it to be easy for you as an opposing attorney to access case or statutory authorities I've marked up as citations in a brief. I'd rather make you spend the time and effort to go back through my court document yourself and markup my citations. My hope would be that you would not have the time, diligence, or know how to do so, and thus might not raise an issue with an authority I've cited.

-----Original Message-----
From: John McClure
Sent: Sun 4/20/2003 8:02 PM
To: Legalxml-Econtracts
Cc:
Subject: [legalxml-econtracts] Official XML Records

. . .

Further, I assert it is essential that markup regarding the semantic content of Court Documents be represented as annotations on the XML elements themselves, not as separate, stand-aside markup because, technically, "stand-aside" markup would be relatively harder to create or modify than the embedded "names" notation, since it would rely so completely on the somewhat complex machinery that XPATH provides to reference substrings within content of an element. One possible (related) solution I imagine is to package XSL-T output in a signed package of resources, however I see this as a band-aid, because the presentation still is divorced from, if it is correlated at all with, the underlying semantic markup that LegalXML groups are defining -- hence, while the presentation artifact becomes the official record, to which one can hyperlink, we lose all the benefit of searching for specific semantic content within an official record. I urge us, instead, towards a simple solution: annotate the official record with markup names, and be done with it.

. . ..

References:

RE: [legalxml-econtracts] Official XML Records
- From: "Chambers, Rolly" <rlchambers@smithcurrie.com>