Re: [legalcitem] Prolegomena to the constitution of subcommittees

I think the question of non-techies is important. Perhaps, each aspect of our work could examine each of the following with subject matter experts for each in the subcommittees:

In this way the technologists can stay within the realities of use and legal usefulness.

-----Original Message-----
From: "Melanie Knapp" <moberlin@gmu.edu>
Sent: Wednesday, February 19, 2014 10:23am
To: legalcitem@lists.oasis-open.org
Subject: Re: [legalcitem] Prolegomena to the constitution of subcommittees

Hi All,

I like Fabio's general guide on subcommittees. I envision subcommittees form because we need to divide the work in order to complete it expeditiously. Our first round of subcommittee work should be DRAFT. Then all subcommittees can compare their work and the whole Committee can adopt the best ideas from the bunch. Then, subcommittees can re-convene to revise first round DRAFTS based on the best ideas agreed upon by the Committee.

Moreover, I think each subcommittee should include techies and non-techies (end users). If we want to self-identify in advance of the next meeting for (a) techie/non-techie and (b) preferred subcommittee/document type, I can prepare a list so that we can dole out responsibility appropriately and quickly at the next meeting.

Melanie Knapp
George Mason University Law Library
Arlington, VA

On 2/18/2014 5:57 PM, Chet Ensign wrote:

Hi Fabio,
I really like the way you have thought this through. Here is some initial feedback...

On Thu, Feb 13, 2014 at 4:59 PM, Fabio Vitali <fabio@cs.unibo.it> wrote:

Personal proposal for the definition of a few relevant terms to this TC:
* citation: an explicit, human-readable mention of a legal text as found in another text, providing sufficient detail for an averagely competent person to identify with precision the relevant text.
* reference: a machine-readable representation of a citation, containing at least the same quantity of information (but possibly more) as the plain text citation for the purpose of identifying the relevant text.
* identifier: a string univocally associated to a document that identifies it. Using an identifier in a reference is a simple way to make it work, but it is not the only way: there will be references that do not contain an identifier, and require more work to find the relevant text.

First, I like the idea of developing a glossary of terms from the start. These kinds of definitions will ensure that we are talking about the same things when we get into discussions. I can go with your definitions above. A couple of notes ... "an averagely competent person to identify with precision the relevant text" - I think that is the ideal - but it doesn't always work out that way in practice & in fact is perhaps one of the motivations for this TC.

And as to identifier - is it always the case that the identifier is one-to-one with a document? Or would it be more accurate to say that an identifier is a string uniquely associated with some resource - that could be a document but could also be a component of a document I just wonder if defining identifier this way narrows its meaning too much.

Next are some standard Web terms that are relevant for this TC, I believe:
* A locator is an identifier of a physical resource (e.g., a file on a hard disk somewhere on the net) that is actionable (that is, it can be immediately used for dereferencing).
* Resolution: the act of determining a usable, active locator of a physical resource given a reference to a document of which said physical resource is a reasonable representation.
* Dereferencing: the act of delivering a copy of a physical resource given its locator.

I'm good with these as well - and these do in fact present an operational description of what we are endeavoring to make possible for legal citation systems.

Accessing a document given a reference, therefore, has two well-distinguished steps: the reference is first resolved, obtaining a locator, and this locator is subsequently dereferenced, obtaining a representation of a physical copy of the document that is actually stored and available somewhere on the web.

Please note that I have abstained from using web-specific acronyms such as HTTP, URI, URL, and URN, because these concepts exist independently from their web implementation, but web standards and best practices are completely consistent with the above terminology.

Next, some basic issues that set apart legal citations from plain hypertext links on the web, in my point of view. Most of the following reasonings derive from my experience with legislation, and I am curious to see how they fare with respect to court documents.

The most important thing that sets legal documents apart is that a citation is almost never to a physical file stored somewhere on the net. Most often it is to an abstract conceptualization of a document, that can correspond to a number of different physical files.

Do you mean - I think this is what you mean - that legal citations are not entered as links but rather as descriptive bibliographical data that people or machines can process into a locator?

A few examples:

* there could be many different physical copies of the same "document", some authoritative (e.g. from the web site of the office emanating it), some not so much (e.g., a union, a political party, a local administration giving access to their own personal stash of documents even when they are not the official publishers of these documents), some plain, some richer in metadata (e.g., a commented version provided by private publisher).

Certainly true for court documents in the US - part of the reason that briefs often give parallel citations. The same case will exist in several reporters and so an author cites them all.

* there could be many variants of the same document differentiated by content (e.g., a full copy vs. an excerpt, maybe of a very long, multi-topical text, of the bits that are relevant to the activities of an office), by language (all European legislation exists in 27 different languages, and the citation to an European act that an Austrian friend sends me as taken from the German version, when I use it I see the right place of the Italian version), by temporal validity (a reference to a 1999 act subsequently modified in 2007, 2010 and 2013, if examined in 2014 for a civil suit about events in 2011, will bring me neither to the 1999 version, nor to the 2013 version, but to the 2010 version of the act).

I think this is less of an issue for court documents in the US. What you'd have, I think, would be the instances of the court opinion in official reporters, the pre-official reporter copies that were published, things like that. But in general, I think if you are citing to a court document, you are citing to a complete document that doesn't morph over time, at least not its content. (Ideally speaking anyway.)

* there could be citations to documents that are not accessible, available or even existing yet: e.g. a citation of a court document that will be released in a separate moment from the publication of a sentence, a citation of a document for which I have no security clearance, a citation of a regulation that will be written after the enactment of the legislation it is mentioned in, a citation of an act that it is foreseen it will modify existence, validity or jurisdiction of this one.

Leaving out our new "secret" courts - brrrr - I don't think this is so much the case in US court documents.

These cases imply, in my view, that references will always be to abstract documents, or conceptualization of documents, rather than physical files somewhere on a hard disk,

That's a pretty fair statement across the board. Even if there is only one physical instance of a document being cited, the citation itself as you say is the human text which is to the abstract document.

and that the most appropriate physical representations of these references will be identified at navigation time by the end user, not at the time of the creation of the reference by the author.

Basically, this means that reference resolution is a NECESSARY STEP, and not an occasional aspect, of navigating legal citations.

I don't know that this necessarily follows. There may be very good reasons why in some applications a implementer might put physical locators into their citation markup. Let's just say I'm a legal editor at BigLegalPub and I am using editorial tools intended to enhance our content for customer products - our special citator for example. The tools developers might very well choose to use hard locators in the content for reasons of efficiency or other proprietary interests. Point is, I think it is too soon to say that is a universal truth.

Please note that I have abstained from using librarian-specific conceptualizations such as FRBR, because these issues exist independently from their proposed solutions, but it is worth noticing that the Akoma Ntoso naming convention, the CEN Metalex standard, the urn:lex protocol, and the ELI proposed standard all rely, explicitly or implicitly, on the concepts of FRBR.

Finally, before getting to the actual list of subcommittees, I would like to propose a first guiding principle for the activities of this working group: NO ARBITRARY STRINGS!

There are two basic approaches at determining identifiers for documents: arbitrary strings or feature lists. The first relies on creating sufficiently long opaque strings and associate them to documents by fiat, so that, say, "ax45wtp987w1" becomes the identifier of "Act n. 12 of 2013 of the Republic of Hungary"; the other is based on seeking a list of relevant characteristics of the documents, that, appropriately codified according to a given syntax, are used to identify the document, so that, say, type=act & number=12 & year=2013 & country=hu are the features of "Act n. 12 of 2013 of the Republic of Hungary" that are necessary to identify it.

Despite arbitrary strings are easier to build tools for, I strongly urge AGAINST using them for this TC: arbitrary strings are fragile (one wrong character and you've misrepresented the reference), rely on a central marshaling station that is the only storage of the mapping between strings and documents, shunt any guesswork on similarly named documents, etc. Arbitrary strings are evil.

Well, you make a good point but again I think it would be premature to make this decision at this stage. I used arbitrary strings in some of our applications way back when - more way back than I care to remember in fact - and they worked out ok.

If, as I hope, we go with feature lists, then an important task of this TC is to determine what are the features of proposed and enacted legislation, of sentences and related court documents, of parliamentary reports and related documents, etc. Features should be divided in
a) identifying vs. accessory (those features that are necessary to identify the document, e.g. the number of an act or the year, vs. those features that are frequently accompanying the reference, but not strictly necessary, e.g., the month and day of an act, if the number is present and is reset at the beginning of the year)
c) required vs. desired (e.g. if I request act 12/2013 in HTML, I will not accept act 13/2013 in HTML, but I am willing to accept act 12/2013 in PDF).
d) describing the cited document vs. the citation itself (e.g., the number of the act is describing the cited document, specifying that a reference is modificatory or groundwork for the judgment are justifying the citation itself, and not describing the cited document). Thus motivation, provenance, type, purpose, scope are all features of the citation and not of the cited document.

---

Thus said, this is my proposal of subcommittees for this TC:

a) court documents: the purpose of this SC is to deliver, in a multinational, multi-language and multi-jurisdictional fashion, the features that characterize legal citations to court documents including judgments, memoirs from the parts, trial documents, and commentaries. These features should be clearly characterized in terms of identifying vs. accessory, required vs. desired, and describing the cited document vs. describing the citation.

b) legislation: the purpose of this SC is to deliver, in a multinational, multi-language and multi-jurisdictional fashion, the features that characterize citations to proposed and enacted legislation and regulations at all levels, including local regulations and international treaties. These features should be clearly characterized in terms of identifying vs. accessory, required vs. desired, and describing the cited document vs. describing the citation.

c) parliamentary documents: the purpose of this SC is to deliver, in a multinational, multi-language and multi-jurisdictional fashion, the features that characterize citations of parliamentary documents including hansards, orders of the day, reports, etc. These features should be clearly characterized in terms of identifying vs. accessory, required vs. desired, and describing the cited document vs. describing the citation.

d) contracts: the purpose of this SC is to deliver, in a multinational, multi-language and multi-jurisdictional fashion, the features that characterize citations of contracts. These features should be clearly characterized in terms of identifying vs. accessory, required vs. desired, and describing the cited document vs. describing the citation.

e) technical SC: the purpose of this SC is to deliver one or more syntactical approaches to express the features of the above-mentioned citation types, so as to provide an easily implementable navigation system using standard browsing tool, as well as to determine behavior, response types and error handling of tools connected to the use of legal references, mainly how to characterize successful and unsuccessful resolution and dereferencing of legal references.

This is a good list - I think we need to consider as a group the stages of how we progress and our approach as a first topic of conversation. My only concern about a breakdown like this is that we might well have conflicting terminology or feature descriptions, gray areas and overlaps (which I think we've already seen raised) - in other words, I would suggest we hold off starting sub-committees until we have put some more structure on to how we'll move forward.

Those are my thoughts...

In case you are still awake after all this, let me know your opinions.

Ciao

Fabio Vitali

--

Fabio Vitali Tiger got to hunt, bird got to fly,
Dept. of Computer Science Man got to sit and wonder "Why, why, why?'
Univ. of Bologna ITALY Tiger got to sleep, bird got to land,
phone: +39 051 2094872 Man got to tell himself he understand.
e-mail: fabio@cs.unibo.it Kurt Vonnegut (1922-2007), "Cat's cradle"
http://vitali.web.cs.unibo.it/

---------------------------------------------------------------------
To unsubscribe from this mail list, you must leave the OASIS TC that
generates this mail. Follow this link to all your TCs in OASIS at:
https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php

--

/chet
----------------
Chet Ensign
Director of Standards Development and TC Administration
OASIS: Advancing open standards for the information society
http://www.oasis-open.org

Primary: +1 973-996-2298
Mobile: +1 201-341-1393
Check your work using the Support Request Submission Checklist at http://www.oasis-open.org/committees/download.php/47248/tc-admin-submission-checklist.html

TC Administration information and support is available at http://www.oasis-open.org/resources/tcadmin

Follow OASIS on:
LinkedIn: http://linkd.in/OASISopen
Twitter: http://twitter.com/OASISopen
Facebook: http://facebook.com/oasis.open

legalcitem message