legalcitem-courts message

Subject: Re: [legalcitem-courts] Usecase--US Federal Courts draft
From: Frank Bennett <biercenator@gmail.com>
To: "Hirsh, Kenneth (hirshkh)" <hirshkh@ucmail.uc.edu>
Date: Tue, 16 Dec 2014 07:45:22 +0900
On Mon, Dec 15, 2014 at 11:29 PM, Hirsh, Kenneth (hirshkh)
<hirshkh@ucmail.uc.edu> wrote:
> I'd like to caution against going "too far into the weeds" in this effort to identify nearly all conceivable sources for court opinions. While I greatly appreciate Frank's excellent work on this so far and I think that his work can stand alone as an inventory of U.S. Court opinion sources, I question whether this detail is needed to construct a model citation format.

To be clear, the data behind the U.S. portion of
http://fbennett.github.io/legal-resource-registry/ was not compiled by
me; it is derived from existing work by Mike Lissner, done for the
purpose of parsing citations out of legacy documents. The Web view
just provides a concrete view of the relations currently expressed in
printed citations. Treat it as a reference that illustrates some of
the issues  that a specification for machine-readable citations would
need to address.

> I also want to raise the fact that the technical subcommittee's statement, including its declaration that we adhere to FRBR, has not been adopted by the full TC. There are reasons that a bibliographic record could contain far more information than what is needed to find a court opinion, or for that matter, any other document within the scope of the TC.

The FRBR terminology is still useful for thinking about what a
machine-readable citation expresses; and we need to know that in order
to construct a specification for it.

> We are not building libraries, but rather are trying to provide a relatively simple and consistent way for citations to be expressed in both a machine-readable and human-readable format.

Absolutely. One point to settle is whether the
citation-to-be-specified identifies the text of a judgment as issued
by a court, or the text available in a specific reporting service. If
we can nail that down, other things will follow from the choice.

Frank

> Ken
>
> Kenneth J. Hirsh
> Director of the Law Library and I.T.
> Professor of Practice
> University of Cincinnati College of Law
> ken.hirsh@uc.edu
> (513) 556-0159
>
> -----Original Message-----
> From: legalcitem-courts@lists.oasis-open.org [mailto:legalcitem-courts@lists.oasis-open.org] On Behalf Of Frank Bennett
> Sent: Friday, December 12, 2014 7:07 PM
> To: John Heywood
> Cc: legalcitem-courts@lists.oasis-open.org
> Subject: Re: [legalcitem-courts] Usecase--US Federal Courts draft
>
> On Sat, Dec 13, 2014 at 6:12 AM, John Quentin Heywood <heywood@wcl.american.edu> wrote:
>> Frank responded to my Wednesday email with some really useful
>> thoughts. I'm interested in hearing what others on the SC think about
>> it all. For convenience, I'm replying to both emails from Frank in one
>> response, so I moved the content around a bit.
>>
>> On 12/11/2014 04:27 PM, Frank Bennett wrote:
>>
>> On Thu, Dec 11, 2014 at 7:31 AM, Frank Bennett <biercenator@gmail.com>
>> wrote:
>>
>> John,
>>
>> Looks like a good start!
>>
>> This raises some questions about scope, I think. (The questions
>> themselves are at the end.)
>>
>> Printed citation forms identify the resource, but involve a step of
>> interpretation for several of the elements. We would need to know
>> those should be cast in the electronic representation of the
>> reference. Taking the first example ...
>>
>> (1) case name string
>>     Is there a formula or a set of constraints for deriving this from
>> the header information in a judgment? If it must be uniform across all
>> citations to the case, it should either be possible to derive it
>> programmatically, or there should be a canonical version of the case
>> name somewhere that can be acquired via a resolver, using other
>> elements that uniquely identify the case (I guess that's the middle
>> layer in FRBR).
>>
>> In US practice,  Bluebook rule 10.2 is what usually governs. The AALL
>> Universal Citation Guide, 3d, in its rule 101, says case names should
>> conform to it, or to ALWD Manual rule 12.2. The Univ. of Chicago's
>> Maroon Book's rule 4.2 is similar, only MUCH simpler. In short, there is a formula.
>> I like the idea of a resolver. Sort of like an authority record in
>> cataloging.
>>
>> (2) volume number
>>     This would be an integer for this category of citation. Is it safe
>> to specify it as an integer, or are there exceptions that would
>> require more flexibility?
>>
>> Not always an integer. Sometimes reporter volumes are issued in parts,
>> e.g., 245A, 245B, 245C, etc. This usually happens in the NRS when
>> volumes are still in prep, and temporary paperback volumes are released piecemeal.
>>
>> (3) reporter abbreviation
>>     As the LRR shows, there is a lot of variation in reporter
>> abbreviations used in the wild (spacing, punctuation, abbreviations).
>> If it is used as an element in an electronic representation of the
>> reference, the abbreviation will need to be consistent across all
>> references. How is it to be derived? The choice would seem to be
>> between a canonical list of reporters and corresponding abbreviations,
>> or the full name of the reporter. A secondary consideration would be
>> whether the elements embedded in a reporter abbreviation (journal +
>> series) should be broken out and represented separately.
>>
>> I think a canonical list of full reporter names and abbreviations
>> would be the way to go. I don't think it necessary to break out the
>> series....treat them as separate entries.
>>
>> (4) first page number
>>     This seems an integer. Same question about constraints as for
>> volume number.
>>
>> In US practice, I have never seen a non-integer page number for a case
>> (roman numbers for intro parts of reporter, but not for cases). I
>> think we could type it as an integer.
>>
>> (5) pinpoint page numbers
>>     Pinpoints can include references to page numbers, note numbers,
>> and possibly other document elements. Should these elements be
>> specified, or is a dumb string sufficient?
>>
>> In neutral citations, the pinpoint is usually to a paragraph number,
>> not a page.
>>
>> (6) circuit justice if applicable
>>     This raises a question of whether the spec is aimed at full
>> description of the resource, or at pinning down the essential
>> information needed to unambiguously identify the resource. If the
>> latter, this would not be needed.
>>
>> That is a really good question. I had been assuming full description,
>> but your comment is making me rethink that.
>>
>> (7) year of decision
>>     In this citation form, are the year of decision and the year of
>> publication always aligned?
>>
>> The year of decision is the year of publication by the court by definition.
>> The "publication" date is not the date the reporter was published. An
>> interesting question is how to deal with changes made by the court
>> after the decision is published, but before the print official
>> reporter hits the streets. (SCOTUS is infamous for this :
>> http://www.nytimes.com/2014/05/25/us/final-word-on-us-law-isnt-supreme
>> -court-keeps-editing.html
>> ) I suppose that type of info can go in the parenthetical string at
>> the end of the citation.
>>
>> (8) parenthetical information such as judge, type of document, weight
>> of authority
>>     This raises the same question as (6).
>>
>> See my comments under 6 & 7 above.
>>
>> (9) the court
>>     SCOTUS citations are to a dedicated reporter, so the court is
>> implicit. Should this be made explicit in an electronic citation?
>> Alternatively, should reporters be made a separate domain in the
>> specification, so that such information can be attached to each?
>>
>> Again with the great question....I think it should be explicit. It
>> seems to me that all court citations should be similar, to make
>> parsing easier among other reasons.
>>
>> ***
>>
>> I guess a threshold question is the scope of the spec:
>>
>>   (a) Does it aim to express the elements of all existing printed
>> citations (this is also Brian's question, I think); or
>>   (b) Does it aim to specify only the elements of all printed
>> citations needed to uniquely identify the resource; or
>>   (c) Does it aim to specific only the minimum elements (or
>> combination of elements) needed to uniquely identify the resource?
>>
>> If the aim is the enrichment of document content with RDF-style links
>> to meaningful text elements, that suggests (a).
>>
>> If the aim is to support parsers capable to linking specifically to
>> cases, that suggests (b) -- this is the aim of the CourtListener
>> database from which the LRR is derived.
>>
>> If the aim is to provide guidance for the construction of resolvers
>> and data to feed to them, that suggests (c). My understanding is that
>> this is what we're aiming for, but I could be wrong.
>>
>> Frank
>>
>> This is the crux. I had been assuming (a) or maybe (b). So I went to
>> the TechSC's latest draft and reread it:
>>
>> " It is NOT the purpose of this TC to establish a proposed syntax for
>> citations."
>>
>> " The relevant task of every subcommittee is therefore to identify
>> types and roles of FRBR entities in their document classes, and
>> classify features according to different levels of a layered model of the document."
>>
>> "Also, subcommittees should also identify how the references to
>> documents of their classes are impacted by the layered view of
>> documents.... It will be a rare case indeed the citation (and
>> therefore the need for a reference) pointing to an FRBR Item (i.e., to
>> a specific file on a specific computer at a specific IP address) or to
>> an FRBR Manifestation (i.e., to a specific characterization in a
>> specific file format of a document). Most frequently a citation points
>> to a legal document existing on a different conceptual layer and in a
>> different level of reality than the physical copies it is embodied by,
>> or by the data formats in which each copy is expressed. More
>> frequently, therefore the citation will identify a document at a more
>> abstract level, e.g., an FRBR Expression when the citation is to a
>> specific version or variant of the document, or an FRBR Work when the
>> citation is to all these versions or variants, or to the one that is
>> identified through a possibly complex contextualization process. In
>> these cases, therefore, the citation MUST be converted to a reference
>> to a Work or an Expression, which is resolved into the physical
>> Locator of the Item only when needed, therefore separating the legal
>> aspects of the identification of the correct version and variant of a document from the technical aspects of the dereferencing of a resource on the World Wide Web."
>>
>> So what info does our part of the spec need to convey?  Our citations
>> will be identifying cases/other court docs at the FRBR Work level and
>> at the FRBR Expression level, methinks. A "print" citation would be on
>> the Expression level, no?
>>
>> To follow up on the scope issue, if the aim (or one aim) is to specify
>> minimal data that can be derived from the text, for the purpose of
>> generating a key for submission to a resolver, would this work for
>> cases:
>>
>>     type (decision)
>>     court (id)
>>     docket number (string)
>>     decision date (date)
>>
>> For the "court" element, an ID would be preferable to the court name,
>> since the latter can change without any change to the institution
>> proper.
>>
>> Resolution would return further details (cites for each reporting
>> service carrying the case, with case name, etc.); the suggestion above
>> is only for the "handle" that uniquely identifies the case.
>>
>> (Whether this makes any sense will depend on the scope of the
>> endeavor, of course.)
>>
>> Where does pinpoint (page or para numbers) fit into this? The citation
>> is to the work as a whole, but also (usually, or mostly) to specific
>> language in that work.
>
> The four features listed above would identify the Work. If all records on the resolver side contain those features, a specific Expression amendable to pinpointing could be obtained by adding a reporter key to the resolver call. The return from the resolver would include other details (volume number, page number or range, etc.), but the core details plus the reporter should be sufficient to identify the pinpoint-able record.
>
>>
>> What do you all think?
>>
>> --
>> John Quentin Heywood
>> heywood@american.edu
>
> ---------------------------------------------------------------------
> To unsubscribe from this mail list, you must leave the OASIS TC that generates this mail.  Follow this link to all your TCs in OASIS at:
> https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php
>
Follow-Ups:
- Re: [legalcitem-courts] Usecase--US Federal Courts draft
  - From: Brian Carver <brian.carver@documentengineeringservices.com>
References:
- Usecase--US Federal Courts draft
  - From: John Quentin Heywood <heywood@wcl.american.edu>
- Re: [legalcitem-courts] Usecase--US Federal Courts draft
  - From: Frank Bennett <biercenator@gmail.com>
- Re: [legalcitem-courts] Usecase--US Federal Courts draft
  - From: Frank Bennett <biercenator@gmail.com>
- Re: [legalcitem-courts] Usecase--US Federal Courts draft
  - From: John Quentin Heywood <heywood@wcl.american.edu>
- Re: [legalcitem-courts] Usecase--US Federal Courts draft
  - From: Frank Bennett <biercenator@gmail.com>
- RE: [legalcitem-courts] Usecase--US Federal Courts draft
  - From: "Hirsh, Kenneth (hirshkh)" <hirshkh@ucmail.uc.edu>