[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: RE: [legalxml-courtfiling] smart docs (was (Microsoft XML Team's WebLog) : Mixing structured and unstructured content in MS Word)
Thanks,
Jim, for your comments. Yes, currently almost everything that is filed in a
court case could be said to be a “form” in some respects. Just the
basic caption structure is a kind of form: IN THE
SUPERIOR COURT OF THE John
Smith Case
# 12345 v. Motion
to Dismiss and Joe
Doaks Imposition
of Sanctions The above
has these data elements, at least: --Plaintiff
name --Defendant
name --Case title/caption
(John Smith V. Joe Doaks) --Court --County --Case Number --Document
Title/Description Somewhere
there will be a signature block with the name, Bar number, and perhaps other
data about the attorney…ever more routine data elements that attorneys
enter now – that is, not additional work shifted from the Clerk to them! Other
data elements can be picked up without attorney effort or clerk re-entry:
Date/Time stamps, Document Type Code (if any), Case Type (usually indicated in
case number), Letterhead Information/Contact Information that comes with the
firm template. Most
courts have strict format requirements relating to how to set up pleadings and
other filings. These address more than just margins and structure of captions.
In our state, for example, a Judgment must have a “Judgment Summary”
on the first page. This was done to 1) summarize the judgment in one place, 2)
give the Clerk one place to look for data elements needed for the state
judgment tracking program (rather than require the data-entry clerk to read the
whole document, try to understand and INTERPRET (a no-no!) it in order to
figure out what data to enter. The attorney gets a positive benefit from “obeying”
this rule – the information is in one place and much confusion is eliminated
for all concerned. Our
state has two kinds of pattern forms: 1) mandatory and 2) optional/for your
convenience. Both of those are amenable to hidden tagging: Put the data element
into the box where it so indicates and the word processing or other application
on which it’s based will apply the appropriate tags. These data element
entry points could occur anywhere in the document – designers of the “form”
(really, a semi-form) would have the task of working out what is best for all
concerned. In between there can be pages and pages of free-form text and
discussion and argumentation and citation, etc. While such material is also
full of “data elements,” none would be tagged because the elements
really needed by the court for automation and data entry/tracking purposes
would have been called out for special attention. Another
level of this business is that XML-tagged legal documents give the law firm the
opportunity to create its own XML schemas and dictionaries, for they could use
XML to track dates, amounts, client files, authorship, typists, and whatever
other data elements the law firm routinely uses! No one, except perhaps here,
in our envisioning the potential released by E-Filing systems, has clearly seen
that the attorney/firm is better off when needed data is called out for tagging
– without interfering with the attorney’s ability to write as she
will in order to persuade the jury or court. The law firm’s own tags
(uniquely theirs) would be meaningless to the court and the clerk’s
systems would ignore them. The idea
of whipping out one’s word processor and just free-form writing the
persuasive argument to win the case is an interesting one – but that
would be no different at all from pulling one’s typewriter out! What will
be filed but a printout from the word-processor or typewriter? So long as
normal format is observed, people know what to expect. However, there would be
no possibility of automated processing of the document as it flows into the DMS
and as it is referenced in the course of docketing and handling of the case. (Some
have dreamed of super-smart OCR/ICR software that could review such printouts
of originally-digital documents reduced to a printout – picture of a page
displayed on a piece of paper – and, in a kind of reverse parsing
process, convert text back to digital and then recognize data elements needed
for docketing and “automatically” pull them out and enter them in
target systems. It’s a nice dream, but any OCR/ICR system is imperfect
enough that a person needs to double-check how the software interpreted the
pixels it analyzed from the paper.) I have
not advocated for adding to the number of data elements that need to be tagged
in order for everyone involved to do their respective jobs. Some have imagined
that as a natural consequence of using tagging, seeing clerks as data-hungry
tag-makers, and then have opposed data tagging because of their imagined
consequence. Attorneys are used to following forms and formats when preparing materials
for the case file. This technology ought to make it easier on them. (Indeed,
within a given case, couldn’t the tagged key data elements become part of
the templates made available for use by authors of briefs and pleadings in the
case?) I have
been imagining here a scenario for “how things could work,” so I
offer this as one of the business cases that the Documents subcommittee would
do well to analyze. Sincerely, Roger Roger Winters Program and Project Manager and Continuing Legal Education (CLE) Coordinator Department of Judicial Administration 516 V: (206) 296-7838 F: (206) 296-0906 roger.winters@metrokc.gov From: Jim Harris
[mailto:jharris@ncsc.dni.us] Roger: You have a unique ability
to break this issue down and relate it to the real-world problems we are trying
to address. Your explanation highlights the primary objectives of the
standards we are asking the justice and legal communities to adopt. I
think the key part of this (which started this whole “smart
documents” dialog) is the challenge associated with creating forms in a
way that is as unobtrusive and intuitive as possible. I interpreted (perhaps
incorrectly) from one of John’s comments that attorneys would like to be
able to draft documents as they do now without any concern to form or tagging
of data. This suggests that electronic systems must be able to parse and
interpret the content in order to identify relevant metadata that we all agree
is essential to realize the benefits of electronic filing. That level of
intelligence is just not feasible (at least with today’s technology)
without the data being tagged in some way. Hence the need for some type
of “form” so that data can be tagged as (or after) it’s
collected. Aren’t forms
routinely a part of filing, even in the paper world? Why should it be any
different in the electronic world? Ultimately, the systems and vendors
who support attorneys and law firms need to put in place mechanisms to support
the standards in a way that leverages use of their systems to feed the
electronic filing process. That will most certainly involve some sort of
form or process to identify and tag information. Further, electronic
court policy should be part of that process so that those systems know what
data must be tagged for that particular type of filing/document in the court to
which they are filing. Jim -----Original Message----- Dear John: Thank you for this reply and, yes, it is very helpful. I understand the importance within a court environment of having common references in documents so court proceedings can avoid delays and confusion while everyone compares different printouts of the same information. I agree with you that PDF seems to have conquered that problem better than others (so far) and it is certainly clear that PDF has credibility with many of our stakeholders. In my presentations about the idea of "smart documents" and
"automating docketing," I have shared the idea that electronic documents must
be understood as existing in several "layers" (for lack of a
better term): The "Top" one is the "human-readable," and this is
where it is important that the page breaks, etc., should all be the same. Any electronic document in a court file, to be useful, must offer this human readable surface that looks like "words on paper." It is the
resemblance to paper that is to be preserved, not necessarily all of the concepts and practices we have built up over years of working with hard copy. The consumers of documents at this level would not necessarily even know that any XML markup has been done. The "Next" level is the "XML markup" or
"software-readable" level. This could also be human-readable (although not formatted for conventional reading) because the data and information in the document remain an exact match for the "human readable" version (above). The XML
tags (labels) used must be the same if there is to be standardization, interoperability, and the potential to leverage XML powers for other uses besides docketing filings for court cases. I see the work that has produced NIEM and GJXDM and such as directed to this "layer."
Here, a "data element tag" must fit a standard (or be a standard
extension). Often, the local term used may vary from the term used in the XML tag: this is fine so long as the "thing described" is the same in
each place. The ideal regarding XML markup would be for the typical author of a document to be filed in court not to be aware of XML markup. Here is where I would think of any filed document having to be, necessarily,
set up as a form in some, but not all, ways. Where a court needs a data element to be routinely marked up, it would include a spot in the "pattern form" where that data element needs to be placed
(example: the case number always appears at a certain place in the first page caption). Unseen to the user would be the process of associating the appropriate XML data element tag with the location in the form where
the respective data elements get their markup. The
"software-readable" layer of the document interacts with applications searching for specified
data element tags so the tagged information can be re-used in targets where it is needed (such as a CMS, DMS, calendar, etc.). Other "layers" of the electronic document can be identified
and discussed, as well: there's a machine-readable layer, I presume, and even the "envelope" might be seen as a "layer"
where the metadata and other information necessary for the Electronic Court Filing action can be located and activated. This "layering" seemed to me to solve a number of problems. A
document can be seen as both something for human beings to read, from which they would be persuaded through argumentation, legal citation, etc., of what is fact, what is law, etc., via the prose created by the (litigant/attorney) author. At the same time, the document could be parsed by a software application that searches out the appropriate, needed data element tags that point to data that will be used in automated docketing, data re-use, etc. Just as one might check out a court file and read it years later, software might still be able to locate needed data elements for years to come by applying the standard in effect at the time of filing. If this line of reasoning is off base or, worse, grounded in impossibilities, it is important to learn that now. Otherwise, I would like to see more people thinking about what it would take to support automated docketing, ultimately yielding substantial savings (reduced human data entry, reduced errors, etc.) and potential for improved performance (e.g., quicker access to needed information for the court and others). I am finding this dialogue to be helpful and I hope my comments are contributing to a constructive resolution which should show us, I believe, that we do not have to force a choice among different approaches. I'll look forward to your and others' reactions to this essay o' mine. Regards, Roger Roger Winters Program and Project Manager Department of Judicial Administration 516 V: (206) 296-7838 F: (206) 296-0906 roger.winters@metrokc.gov -----Original Message----- From: John Messing [mailto:jmessing@law-on-line.com] Sent: Saturday, January 13, 2007 6:33 AM To: Winters, Roger Cc: legalxml-courtfiling@lists.oasis-open.org; O'Brien,Robert; Hickman,Brian Subject: RE: [legalxml-courtfiling] smart docs (was (Microsoft XML Team's WebLog) : Mixing structured and unstructured content in MS Word) Roger: I think many of us are groping our way through the issues, with different levels of technical understanding and familiarity with jargon. Frustration with my own lack of understanding is quite familar to me, I can assure you. My take on where we are is this: ECF has developed tools to send documents between places in order to file them. These documents have been PDF documents for the most part that are "dumb" compared to the XML that carries them. The mortgage industry and the land recorders have been working on the same problem and have sophisticated tools that are different from ECF to accomplish many of the same purposes, also using PDF's and XML documents. NIEM is a governmental-sponsored effort for the same transporting purpose but at a greater level of sophistication because the tagged terms can be associated with other tagged terms, combining the power of them, using a technology called RDF, and with other powerful features. I believe it is useful to settle on one basic set of building blocks, and perhaps NIEM should be it for transport purposes. ECF seems to take that approach. So does Enotary, at least in principle. That still leaves the problem you have wisely directed attention to, which is having the documents act in a smart fashion so that the content of the PDF's doesn't have to be extracted and the data entered by hand when they arrive at the destination. The electronic document vendors have not been idle and have retooled their products to make them "smart"; this is done by making
the products out of XML, instead of the proprietary formats previously used and then displaying them using the legacy proprietary programs. This
has required turning the programs inside out, in a manner of speaking. One consequence is that they can natively be incorporated by XML schemas, or can communicate with the schemas, to accomplish goals without having the awkwardness ECF has become accustomed to, for example, of embedding the PDF's in XML. Brian just reported on how Microsoft has retooled office products, including MS-Word to accomplish this purpose. Adobe has been doing the same thing. Adobe has two advantages over MS as I see it, apart from document security features, which are not germane to this discussion. 1. The current XML offerings can be displayed by the Adobe Reader and Acrobat programs in both the old and new formats, making the Reader and Acrobat programs backwards compatible. The new Office products may not be fully backwards compatible according to what I have read. 2. Adobe PDF documents preserve pagination regardless of screen resolution. MS-Word has not up until now. That means multiple copies of an MS document may show different pagination on different computers
even though the same file is displayed on all of them. This shortcoming makes discussion of a document's content, as in a court hearing with multiple attorneys, a judge and a clerk, very difficult, because passages will not necessarily appear on the same page as displayed on the computer even though the identical file is used. PDF does not suffer from this limitation. With regard to your question about eNotary, I think the short answer is "none of the above." The membership of enotary is largely
from the mortgage and land recording groups who are trying to solve a bottleneck in their workflow represented by enotarization. They are confronted by many of the same issues but until now have been solving them in different ways from ECF. They seem to have much more knowledge about the PDF document aspects, in large part through John Jones, who is a representative from the land recorder industry to enotary. He works closely with top Adobe engineers. I think each group, ecf and enotary, can learn much from each other about the practical solutions to common problems that cut across legal domains, provided there is a willingness to do so, which may require letting go of legacy notions of turf. I hope this is helpful. > -------- Original Message -------- > Subject: RE: [legalxml-courtfiling] FW: (Microsoft XML Team's
WebLog) : > Mixing structured and unstructured content in MS Word > From: "Winters, Roger" <Roger.Winters@METROKC.GOV> > Date: Fri, January 12, 2007 5:40 pm > To: "John Messing" <jmessing@law-on-line.com>,
"Hickman,Brian" > <Brian.Hickman@wolterskluwer.com> > Cc: <legalxml-courtfiling@lists.oasis-open.org>,
"O'Brien,Robert" > <Robert.OBrien@cas-satj.gc.ca> > > TO: All > > It will be interesting to explore these possibilities. I have
trouble > deciphering acronyms I've never heard of and I keep hoping that sometime > they will come to light and the basic ideas will be expressed so I
can > understand them. What are the pros and cons of one approach vs. another? > What are the business consequences or assumptions behind using one
or > another? For example, is the idea about using this in
E-Notarization > based on assumptions about notaries, attorneys, people in general,
or is > it coming entirely from scientific technical reasons that can't be > controversial? Are there religious issues behind the approaches? (e.g., > in Notarization is there a religious conservatism that calls for things > to resemble "traditional" approaches, or is that not an
issue?) Will > this flavor of PDF work until Adobe clamps down in some future
time and > takes free PDF reading away? A zillion questions come to mind for those > of us who are not adept at acronym-eze or tech-speak. > > I don't ask these questions seeking specific answers or defenses
of > positions - I am too ignorant of all of this to have a position
yet. I > ask them only to illustrate that we continue to have among us all
a > language barrier. It does not seem to be a problem for those who
are so > technically advanced that acronyms spill from their tongues as
they > enthrall other technically advanced folks with brilliant new > possibilities. It is a problem when those of us who would love to > understand the possibilities find ourselves hopelessly lost
because they > seem only to be speaking in tongues in which we have no
experience. It > is not at all insulting to "dumb it down" for others. > > That raises another consideration - if ECFTC does not make a
certain > attainment of technical expertise a requirement for participation,
is it > therefore a requirement that the rest engage in "dumbing it
down" for > the rest? Who must take pains to bridge the communication gap? And
it is > a gap and there is pain involved in trying to guess, beg for
answers, > preach against acronyms, etc. How can we work together if we do
not find > the bridges and translations needed to understand one another?
Could XML > come to our rescue by some "schema" (still not sure I
can explain that > idea to others) that processes the terms somehow so that a
"dummies" > version is generated as well? > > Did I miss the prerequisite classes that everyone else took and
aced? > > Happy MLK Weekend! > > Roger > > Roger Winters > Program and Project Manager > and > Continuing Legal Education (CLE) Coordinator > > Department of Judicial Administration > > > V: (206) 296-7838 > F: (206) 296-0906 > roger.winters@metrokc.gov > > > -----Original Message----- > From: John Messing [mailto:jmessing@law-on-line.com] > Sent: Friday, January 12, 2007 4:24 PM > To: Hickman,Brian > Cc: legalxml-courtfiling@lists.oasis-open.org; O'Brien,Robert > Subject: RE: [legalxml-courtfiling] FW: (Microsoft XML Team's
WebLog) : > Mixing structured and unstructured content in MS Word > > An alternative is Adobe's XFA format which enables XML schema's to > generate PDF layout documents. It is ideal for form-based
documents. > This likely will be the document format structure that eNotary
will use > for the layout of its form-based notary certificates, jurats and > acknowledgements. > > > -------- Original Message -------- > > Subject: [legalxml-courtfiling] FW: (Microsoft XML Team's
WebLog) : > > Mixing structured and unstructured content in MS Word > > From: "Hickman, Brian"
<Brian.Hickman@wolterskluwer.com> > > Date: Fri, January 12, 2007 4:53 pm > > To: <legalxml-courtfiling@lists.oasis-open.org>,
"O'Brien,Robert" > > <Robert.OBrien@cas-satj.gc.ca> > > > > After reading Roger Winters and John Messing's posts on
embedding > > structured and unstructured content in a pleading I thought I
would > ask > > Microsoft's XML team to recommend a method to add structured
/ machine > > readable content to an MS Word document that also contains > unstructured > > / narrative content. > > > > I am forwarding Microsoft's response for your review. > > > > Brian Hickman > > Attorney > > Government Relations > > CT > > > > > > > > > > 206 622 4511 (tel) > > 206 437 1766 (mobile) > > brian.hickman@wolterskluwer.com > > > > > > > > -----Original Message----- > > From: Brian Jones (OFFICE)
[mailto:brijones@exchange.microsoft.com] > > Sent: Friday, January 12, 2007 1:30 PM > > To: Adam Wiener; Michael Champion; Hickman, Brian; Steven
Goulet; Doug > > Mahugh; Gray Knowlton > > Subject: RE: (Microsoft XML Team's WebLog) : Mixing
structured and > > unstructured content in MS Word > > > > Hi Brian, > > The model in both Word 2003 and 2007 is to allow you to add your > custom > > XML markup to a Word document so that it lives alongside the > formatting > > and layout information. > > The validation occurs on your schema on its own, even though
there is > > also WordprocessingML whenever you save the file. > > > > It's recommended that you leverage the Word structures as
much as > > possible, and only add your own XML markup for persisting
semantics > that > > can't be captured with the Word model. > > I would also suggest learning more about the new content controls > > feature in Word 2007. This allows you to add more structure
on top of > > your Word documents. There is a series of blog posts on the
Word blog > > that cover this, and I just recently blogged about the post
that > covers > > mapping custom XML to content controls: > > > http://blogs.msdn.com/brian_jones/archive/2007/01/10/the-power-of-data-v > > iew-separation-in-your-documents.aspx > > > > > > -Brian > > > > -----Original Message----- > > From: Adam Wiener > > Sent: Friday, January 12, 2007 12:13 PM > > To: Adam Wiener; Michael Champion;
brian.hickman@wolterskluwer.com; > > Brian Jones (OFFICE); Steven Goulet; Doug Mahugh; Gray
Knowlton > > Subject: RE: (Microsoft XML Team's WebLog) : Mixing
structured and > > unstructured content in MS Word > > > > Adding Doug and Gray as well... XML Bloggers on bcc... > > > > Thanks, > > Adam > > > > -----Original Message----- > > From: Adam Wiener > > Sent: Friday, January 12, 2007 10:32 AM > > To: Michael Champion; brian.hickman@wolterskluwer.com; Xml
Team > > Bloggers; Brian Jones (OFFICE); Steven Goulet > > Subject: RE: (Microsoft XML Team's WebLog) : Mixing
structured and > > unstructured content in MS Word > > > > Looping in Brian Jones and Steven Goulet... > > > > Can you please take a look at Mr. Hickman's question below? > > > > Thanks, > > Adam > > > > -----Original Message----- > > From: Michael Champion > > Sent: Thursday, January 11, 2007 8:29 PM > > To: brian.hickman@wolterskluwer.com; Xml Team Bloggers > > Subject: RE: (Microsoft XML Team's WebLog) : Mixing
structured and > > unstructured content in MS Word > > > > Thanks for your inquiry. The people on this list are
not Word > experts, > > so I'll try to find someone in the Office team who can
answer. (Or, > if > > one of you on the XML team does know the answer, feel free to
chime > in!) > > > > I know that you can edit documents that conform to a custom
schema in > > Word 2003 and 2007. > > http://blogs.msdn.com/brian_jones/archive/2006/01/25/517739.aspx > > http://msdn.microsoft.com/msdnmag/issues/03/11/XMLFiles/ > > > > I don't know about mixing structured (custom schema)
and unstructured > > (default Word schema) in one doc, however, if that is what
you are > > asking. Please let me know if you don't hear back
from someone in > > Office in a timely manner and I'll try to follow up. > > > > Mike Champion > > > > > -----Original Message----- > > > From: brian.hickman@wolterskluwer.com > > [mailto:brian.hickman@wolterskluwer.com] > > > Sent: Thursday, January 11, 2007 5:42 PM > > > To: Xml Team Bloggers > > > Subject: (Microsoft XML Team's WebLog) : Mixing
structured and > > unstructured > > > content in MS Word > > > Importance: High > > > > > > > > > I am a member of OASIS LegalXML's Electronic Court
Filing Technical > > Committee > > > and an attorney with CT Corporation. The
goal of the technical > > committee is > > > to develop standards to file documents electronically with
courts. > > Today, > > > most documents produced by the legal industry are
produced in MS > Word. > > > Unfortunately, today, a human must read the document at
the > courthouse > > to > > > extract data from the document to populate the court's
case > management > > system. > > > My question is: Can we integrate content that
conforms to a custom > > data model > > > into MS Word such that structured content and
unstructured content > can > > reside > > > in the same document? If the case management system
could extract > > content > > > from an MS Word file that conformed to a customize data
model (i'm > > thinking > > > along the lines of adding an MS Scheme that matched the
court's > > requirements) > > > then an automated process could extract data directly
from the MS > Word > > file. > > > > > > If you look at a legal pleading you will see that some
sections of > the > > > document are structured and conform to a data model that
conforms to > a > > set of > > > rules expressed by the court in narrative format and
some parts of > the > > > document are almost unstructured, such a a paragraph of
narrative. > > > > > > What approach would you recommend to allow attorneys to
use the tool > > they are > > > familiar with, MS Word, and still embed some machine
readable > content > > within > > > the MS Word document? > > > > > > Thank you > > > > > > Brian Hickman > > > ---------------------------------- > > > This message was generated from a contact form at: > > > http://blogs.msdn.com/xmlteam/default.aspx > > > It was submitted by Brian Hickman (brian.hickman@wolterskluwer.com) > > > > > > Your contact information was not shared with the user. |
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]