Re: [legalcitem-courts] Usecase--US Federal Courts draft

legalcitem-courts message

Subject: Re: [legalcitem-courts] Usecase--US Federal Courts draft

From: John Quentin Heywood <heywood@wcl.american.edu>

To: Frank Bennett <biercenator@gmail.com>

Date: Fri, 12 Dec 2014 16:12:30 -0500

Frank responded to my Wednesday email with some really useful thoughts. I'm interested in hearing what others on the SC think about it all. For convenience, I'm replying to both emails from Frank in one response, so I moved the content around a bit.

On 12/11/2014 04:27 PM, Frank Bennett wrote:

On Thu, Dec 11, 2014 at 7:31 AM, Frank Bennett <biercenator@gmail.com> wrote:

John,

Looks like a good start!

This raises some questions about scope, I think. (The questions
themselves are at the end.)

Printed citation forms identify the resource, but involve a step of
interpretation for several of the elements. We would need to know
those should be cast in the electronic representation of the
reference. Taking the first example ...

(1) case name string
    Is there a formula or a set of constraints for deriving this from
the header information in a judgment? If it must be uniform across all
citations to the case, it should either be possible to derive it
programmatically, or there should be a canonical version of the case
name somewhere that can be acquired via a resolver, using other
elements that uniquely identify the case (I guess that's the middle
layer in FRBR).

In US practice, Bluebook rule 10.2 is what usually governs. The AALL Universal Citation Guide, 3d, in its rule 101, says case names should conform to it, or to ALWD Manual rule 12.2. The Univ. of Chicago's Maroon Book's rule 4.2 is similar, only MUCH simpler. In short, there is a formula. I like the idea of a resolver. Sort of like an authority record in cataloging.

(2) volume number
    This would be an integer for this category of citation. Is it safe
to specify it as an integer, or are there exceptions that would
require more flexibility?

Not always an integer. Sometimes reporter volumes are issued in parts, e.g., 245A, 245B, 245C, etc. This usually happens in the NRS when volumes are still in prep, and temporary paperback volumes are released piecemeal.

(3) reporter abbreviation
    As the LRR shows, there is a lot of variation in reporter
abbreviations used in the wild (spacing, punctuation, abbreviations).
If it is used as an element in an electronic representation of the
reference, the abbreviation will need to be consistent across all
references. How is it to be derived? The choice would seem to be
between a canonical list of reporters and corresponding abbreviations,
or the full name of the reporter. A secondary consideration would be
whether the elements embedded in a reporter abbreviation (journal +
series) should be broken out and represented separately.

I think a canonical list of full reporter names and abbreviations would be the way to go. I don't think it necessary to break out the series....treat them as separate entries.


(4) first page number
    This seems an integer. Same question about constraints as for volume number.

In US practice, I have never seen a non-integer page number for a case (roman numbers for intro parts of reporter, but not for cases). I think we could type it as an integer.


(5) pinpoint page numbers
    Pinpoints can include references to page numbers, note numbers,
and possibly other document elements. Should these elements be
specified, or is a dumb string sufficient?

In neutral citations, the pinpoint is usually to a paragraph number, not a page.


(6) circuit justice if applicable
    This raises a question of whether the spec is aimed at full
description of the resource, or at pinning down the essential
information needed to unambiguously identify the resource. If the
latter, this would not be needed.

That is a really good question. I had been assuming full description, but your comment is making me rethink that.


(7) year of decision
    In this citation form, are the year of decision and the year of
publication always aligned?

The year of decision is the year of publication by the court by definition. The "publication" date is not the date the reporter was published. An interesting question is how to deal with changes made by the court after the decision is published, but before the print official reporter hits the streets. (SCOTUS is infamous for this : http://www.nytimes.com/2014/05/25/us/final-word-on-us-law-isnt-supreme-court-keeps-editing.html ) I suppose that type of info can go in the parenthetical string at the end of the citation.


(8) parenthetical information such as judge, type of document, weight
of authority
    This raises the same question as (6).

See my comments under 6 & 7 above.

(9) the court
    SCOTUS citations are to a dedicated reporter, so the court is
implicit. Should this be made explicit in an electronic citation?
Alternatively, should reporters be made a separate domain in the
specification, so that such information can be attached to each?

Again with the great question....I think it should be explicit. It seems to me that all court citations should be similar, to make parsing easier among other reasons.


***

I guess a threshold question is the scope of the spec:

  (a) Does it aim to express the elements of all existing printed
citations (this is also Brian's question, I think); or
  (b) Does it aim to specify only the elements of all printed
citations needed to uniquely identify the resource; or
  (c) Does it aim to specific only the minimum elements (or
combination of elements) needed to uniquely identify the resource?

If the aim is the enrichment of document content with RDF-style links
to meaningful text elements, that suggests (a).

If the aim is to support parsers capable to linking specifically to
cases, that suggests (b) -- this is the aim of the CourtListener
database from which the LRR is derived.

If the aim is to provide guidance for the construction of resolvers
and data to feed to them, that suggests (c). My understanding is that
this is what we're aiming for, but I could be wrong.

Frank

This is the crux. I had been assuming (a) or maybe (b). So I went to the TechSC's latest draft and reread it:

" It is NOT the purpose of this TC to establish a proposed syntax for citations."

" The relevant task of every subcommittee is therefore to identify types and roles of FRBR entities in their document classes, and classify features according to different levels of a layered model of the document."

"Also, subcommittees should also identify how the references to documents of their classes are impacted by the layered view of documents.... It will be a rare case indeed the citation (and therefore the need for a reference) pointing to an FRBR Item (i.e., to a specific file on a specific computer at a specific IP address) or to an FRBR Manifestation (i.e., to a specific characterization in a specific file format of a document). Most frequently a citation points to a legal document existing on a different conceptual layer and in a different level of reality than the physical copies it is embodied by, or by the data formats in which each copy is expressed. More frequently, therefore the citation will identify a document at a more abstract level, e.g., an FRBR _expression_ when the citation is to a specific version or variant of the document, or an FRBR Work when the citation is to all these versions or variants, or to the one that is identified through a possibly complex contextualization process. In these cases, therefore, the citation MUST be converted to a reference to a Work or an _expression_, which is resolved into the physical Locator of the Item only when needed, therefore separating the legal aspects of the identification of the correct version and variant of a document from the technical aspects of the dereferencing of a resource on the World Wide Web."

So what info does our part of the spec need to convey? Our citations will be identifying cases/other court docs at the FRBR Work level and at the FRBR _expression_ level, methinks. A "print" citation would be on the _expression_ level, no?

To follow up on the scope issue, if the aim (or one aim) is to specify
minimal data that can be derived from the text, for the purpose of
generating a key for submission to a resolver, would this work for
cases:

    type (decision)
    court (id)
    docket number (string)
    decision date (date)

For the "court" element, an ID would be preferable to the court name,
since the latter can change without any change to the institution
proper.

Resolution would return further details (cites for each reporting
service carrying the case, with case name, etc.); the suggestion above
is only for the "handle" that uniquely identifies the case.

(Whether this makes any sense will depend on the scope of the
endeavor, of course.)

Where does pinpoint (page or para numbers) fit into this? The citation is to the work as a whole, but also (usually, or mostly) to specific language in that work.

What do you all think?

-- 
John Quentin Heywood
heywood@american.edu