legalcitem-courts message

Subject: Re: [legalcitem-courts] Usecase--US Federal Courts draft
From: Frank Bennett <biercenator@gmail.com>
To: brian.carver@documentengineeringservices.com
Date: Tue, 16 Dec 2014 10:58:32 +0900
On Tue, Dec 16, 2014 at 8:59 AM, Brian Carver
<brian.carver@documentengineeringservices.com> wrote:
> I still have the same question I last asked on the list, whether the
> machine-readable markup syntax we will create will enable people to
> indicate exactly what the citation looked like in some text they may be
> trying to faithfully represent or whether it will only provide a way to
> refer to some sort of standardized citation.
>
> I think it would be unfortunate if the syntax we propose were so
> impoverished that it could only do the latter.

It depends on how things are organized (more below).

>
> Instead, we might have a syntax that embraced both. We could introduce
> the notion of providing a "literal" markup of precisely what appeared in
> the text being represented and also providing markup for the component
> parts of a citation such that any standardized citation style, such as
> the Bluebook, could be generated.
>
> So, when I'm creating a machine-readable citation of a text that
> contains this mess:
>
> Ingle v. Landis Tool Co. (C.C.A.) 272 F. 464
>
> I could both represent what is literally there and what a modern
> standardized citation would look like. This might take the form:
>
> <span class="citation">
>  <span class="literal">Ingle v. Landis Tool Co. (C.C.A.) 272 F. 464</span>
>  <a href="/opinion/061374/ingle-v-landis-tool-co">
>  <span class="casename">Ingle v. Landis Tool Co.</span>
>  <span class="volume">272</span>
>  <span class="reporter_abbr">F.</span>
>  <span class="page">464</span>
>  <span class="court">3rd Cir.</span>
>  <span class="date">1921</span>
>  </a>
> </span>
>
> From this information, if I wanted to generate something standardized to
> BB format, namely:
>
> Ingle v. Landis Tool Co., 272 F. 464 (3rd Cir. 1921).
>
> then I could do that, but if I wanted to faithfully represent the
> original text, I could also do that by taking what's in "literal" and
> making that my hyperlink text instead.

Preserving the original text is certainly important. What I am
wondering is what the details in the html:a span should signify.

The example contains elements not present in the cite on the page
("3rd Cir.", "1921"), so they would need to be inferred from some
external source. The same problem arises in a more extreme form with
WL or LEXIS cites cast before a case appears in an official reporter.

To cope with implicit details, we have to assume that some resolver
mechanism exists, through which we can obtain full details of the
original document. If that's the case, resolution can happen at either
end: at markup time (to fill in details to be embedded directly in the
document); or at lookup time (to retrieve details based on a unique
key). Of the two, I think the latter is a better design choice.

In the technical group, Fabio presented on a three-stage resolution
process. For cases, it would look something like this:

(1) Some readily accessible features are fed to the resolver:

    {
        name: "Ingle v. Landis Tool Co.",
        volume: "272",
        reporter: "F.",
        page: "464"
    }

(2) The resolver attempts to match the features to some unique key
that identifies the work, say:

    us-272-fed-464

(3) The unique key is used to return a set of references to the case.
If there is only one available reference, or if resolution is limited
by constraints imposed by the original call, the resolver might return
something like this:

    {
        name: "Ingle v. Landis Tool Co.",
        volume: "272",
        reporter: "Federal Reporter",
        page: "464",
        decisionDate: "1921-03-08",
        publicationDate: "1921",
        courtID: "us;federal;court.appeals.3.circuit",
        publisher: "West",
    }

It's an elaborate series of steps, but necessary in order to obtain
the details implicit in the printed cite.

Since the return values will require curation over time (to correct
errors or to add supplementary details that become available), it
would be cumbersome to embed the full information in individual
documents. That would be the argument for requiring only a limited set
of details at document level, and relying on resolution to obtain a
copy of the document, or to request full citation details.

>
> It seems to me we ought to end up where something like this is possible,
> otherwise I think a lot of people are not going to want to use a markup
> standard that forces them to misrepresent the text they are reproducing.
>
> That is, I think our organizing call insisted that we would not touch
> the text on the page. Instead the markup in the background should be
> smart enough to tell us what was on the page AND how a modern person
> might cite that resource according to various standard approaches.
>
> Brian
>
> On 12/15/2014 02:45 PM, Frank Bennett wrote:
>> On Mon, Dec 15, 2014 at 11:29 PM, Hirsh, Kenneth (hirshkh)
>> <hirshkh@ucmail.uc.edu> wrote:
>>> I'd like to caution against going "too far into the weeds" in this
>>> effort to identify nearly all conceivable sources for court
>>> opinions. While I greatly appreciate Frank's excellent work on this
>>> so far and I think that his work can stand alone as an inventory of
>>> U.S. Court opinion sources, I question whether this detail is
>>> needed to construct a model citation format.
>>
>> To be clear, the data behind the U.S. portion of
>> http://fbennett.github.io/legal-resource-registry/ was not compiled
>> by me; it is derived from existing work by Mike Lissner, done for
>> the purpose of parsing citations out of legacy documents. The Web
>> view just provides a concrete view of the relations currently
>> expressed in printed citations. Treat it as a reference that
>> illustrates some of the issues  that a specification for
>> machine-readable citations would need to address.
>>
>>> I also want to raise the fact that the technical subcommittee's
>>> statement, including its declaration that we adhere to FRBR, has
>>> not been adopted by the full TC. There are reasons that a
>>> bibliographic record could contain far more information than what
>>> is needed to find a court opinion, or for that matter, any other
>>> document within the scope of the TC.
>>
>> The FRBR terminology is still useful for thinking about what a
>> machine-readable citation expresses; and we need to know that in
>> order to construct a specification for it.
>>
>>> We are not building libraries, but rather are trying to provide a
>>> relatively simple and consistent way for citations to be expressed
>>> in both a machine-readable and human-readable format.
>>
>> Absolutely. One point to settle is whether the
>> citation-to-be-specified identifies the text of a judgment as issued
>> by a court, or the text available in a specific reporting service.
>> If we can nail that down, other things will follow from the choice.
>>
>> Frank
>>
>>> Ken
>>>
>>> Kenneth J. Hirsh Director of the Law Library and I.T. Professor of
>>> Practice University of Cincinnati College of Law ken.hirsh@uc.edu
>>> (513) 556-0159
>>>
>>> -----Original Message----- From:
>>> legalcitem-courts@lists.oasis-open.org
>>> [mailto:legalcitem-courts@lists.oasis-open.org] On Behalf Of Frank
>>> Bennett Sent: Friday, December 12, 2014 7:07 PM To: John Heywood
>>> Cc: legalcitem-courts@lists.oasis-open.org Subject: Re:
>>> [legalcitem-courts] Usecase--US Federal Courts draft
>>>
>>> On Sat, Dec 13, 2014 at 6:12 AM, John Quentin Heywood
>>> <heywood@wcl.american.edu> wrote:
>>>> Frank responded to my Wednesday email with some really useful
>>>> thoughts. I'm interested in hearing what others on the SC think
>>>> about it all. For convenience, I'm replying to both emails from
>>>> Frank in one response, so I moved the content around a bit.
>>>>
>>>> On 12/11/2014 04:27 PM, Frank Bennett wrote:
>>>>
>>>> On Thu, Dec 11, 2014 at 7:31 AM, Frank Bennett
>>>> <biercenator@gmail.com> wrote:
>>>>
>>>> John,
>>>>
>>>> Looks like a good start!
>>>>
>>>> This raises some questions about scope, I think. (The questions
>>>> themselves are at the end.)
>>>>
>>>> Printed citation forms identify the resource, but involve a step
>>>> of interpretation for several of the elements. We would need to
>>>> know those should be cast in the electronic representation of
>>>> the reference. Taking the first example ...
>>>>
>>>> (1) case name string Is there a formula or a set of constraints
>>>> for deriving this from the header information in a judgment? If
>>>> it must be uniform across all citations to the case, it should
>>>> either be possible to derive it programmatically, or there should
>>>> be a canonical version of the case name somewhere that can be
>>>> acquired via a resolver, using other elements that uniquely
>>>> identify the case (I guess that's the middle layer in FRBR).
>>>>
>>>> In US practice,  Bluebook rule 10.2 is what usually governs. The
>>>> AALL Universal Citation Guide, 3d, in its rule 101, says case
>>>> names should conform to it, or to ALWD Manual rule 12.2. The
>>>> Univ. of Chicago's Maroon Book's rule 4.2 is similar, only MUCH
>>>> simpler. In short, there is a formula. I like the idea of a
>>>> resolver. Sort of like an authority record in cataloging.
>>>>
>>>> (2) volume number This would be an integer for this category of
>>>> citation. Is it safe to specify it as an integer, or are there
>>>> exceptions that would require more flexibility?
>>>>
>>>> Not always an integer. Sometimes reporter volumes are issued in
>>>> parts, e.g., 245A, 245B, 245C, etc. This usually happens in the
>>>> NRS when volumes are still in prep, and temporary paperback
>>>> volumes are released piecemeal.
>>>>
>>>> (3) reporter abbreviation As the LRR shows, there is a lot of
>>>> variation in reporter abbreviations used in the wild (spacing,
>>>> punctuation, abbreviations). If it is used as an element in an
>>>> electronic representation of the reference, the abbreviation will
>>>> need to be consistent across all references. How is it to be
>>>> derived? The choice would seem to be between a canonical list of
>>>> reporters and corresponding abbreviations, or the full name of
>>>> the reporter. A secondary consideration would be whether the
>>>> elements embedded in a reporter abbreviation (journal + series)
>>>> should be broken out and represented separately.
>>>>
>>>> I think a canonical list of full reporter names and
>>>> abbreviations would be the way to go. I don't think it necessary
>>>> to break out the series....treat them as separate entries.
>>>>
>>>> (4) first page number This seems an integer. Same question about
>>>> constraints as for volume number.
>>>>
>>>> In US practice, I have never seen a non-integer page number for a
>>>> case (roman numbers for intro parts of reporter, but not for
>>>> cases). I think we could type it as an integer.
>>>>
>>>> (5) pinpoint page numbers Pinpoints can include references to
>>>> page numbers, note numbers, and possibly other document elements.
>>>> Should these elements be specified, or is a dumb string
>>>> sufficient?
>>>>
>>>> In neutral citations, the pinpoint is usually to a paragraph
>>>> number, not a page.
>>>>
>>>> (6) circuit justice if applicable This raises a question of
>>>> whether the spec is aimed at full description of the resource, or
>>>> at pinning down the essential information needed to unambiguously
>>>> identify the resource. If the latter, this would not be needed.
>>>>
>>>> That is a really good question. I had been assuming full
>>>> description, but your comment is making me rethink that.
>>>>
>>>> (7) year of decision In this citation form, are the year of
>>>> decision and the year of publication always aligned?
>>>>
>>>> The year of decision is the year of publication by the court by
>>>> definition. The "publication" date is not the date the reporter
>>>> was published. An interesting question is how to deal with
>>>> changes made by the court after the decision is published, but
>>>> before the print official reporter hits the streets. (SCOTUS is
>>>> infamous for this :
>>>> http://www.nytimes.com/2014/05/25/us/final-word-on-us-law-isnt-supreme
>>>>
>>>>
> -court-keeps-editing.html
>>>> ) I suppose that type of info can go in the parenthetical string
>>>> at the end of the citation.
>>>>
>>>> (8) parenthetical information such as judge, type of document,
>>>> weight of authority This raises the same question as (6).
>>>>
>>>> See my comments under 6 & 7 above.
>>>>
>>>> (9) the court SCOTUS citations are to a dedicated reporter, so
>>>> the court is implicit. Should this be made explicit in an
>>>> electronic citation? Alternatively, should reporters be made a
>>>> separate domain in the specification, so that such information
>>>> can be attached to each?
>>>>
>>>> Again with the great question....I think it should be explicit.
>>>> It seems to me that all court citations should be similar, to
>>>> make parsing easier among other reasons.
>>>>
>>>> ***
>>>>
>>>> I guess a threshold question is the scope of the spec:
>>>>
>>>> (a) Does it aim to express the elements of all existing printed
>>>> citations (this is also Brian's question, I think); or (b) Does
>>>> it aim to specify only the elements of all printed citations
>>>> needed to uniquely identify the resource; or (c) Does it aim to
>>>> specific only the minimum elements (or combination of elements)
>>>> needed to uniquely identify the resource?
>>>>
>>>> If the aim is the enrichment of document content with RDF-style
>>>> links to meaningful text elements, that suggests (a).
>>>>
>>>> If the aim is to support parsers capable to linking specifically
>>>> to cases, that suggests (b) -- this is the aim of the
>>>> CourtListener database from which the LRR is derived.
>>>>
>>>> If the aim is to provide guidance for the construction of
>>>> resolvers and data to feed to them, that suggests (c). My
>>>> understanding is that this is what we're aiming for, but I could
>>>> be wrong.
>>>>
>>>> Frank
>>>>
>>>> This is the crux. I had been assuming (a) or maybe (b). So I went
>>>> to the TechSC's latest draft and reread it:
>>>>
>>>> " It is NOT the purpose of this TC to establish a proposed syntax
>>>> for citations."
>>>>
>>>> " The relevant task of every subcommittee is therefore to
>>>> identify types and roles of FRBR entities in their document
>>>> classes, and classify features according to different levels of a
>>>> layered model of the document."
>>>>
>>>> "Also, subcommittees should also identify how the references to
>>>> documents of their classes are impacted by the layered view of
>>>> documents.... It will be a rare case indeed the citation (and
>>>> therefore the need for a reference) pointing to an FRBR Item
>>>> (i.e., to a specific file on a specific computer at a specific IP
>>>> address) or to an FRBR Manifestation (i.e., to a specific
>>>> characterization in a specific file format of a document). Most
>>>> frequently a citation points to a legal document existing on a
>>>> different conceptual layer and in a different level of reality
>>>> than the physical copies it is embodied by, or by the data
>>>> formats in which each copy is expressed. More frequently,
>>>> therefore the citation will identify a document at a more
>>>> abstract level, e.g., an FRBR Expression when the citation is to
>>>> a specific version or variant of the document, or an FRBR Work
>>>> when the citation is to all these versions or variants, or to the
>>>> one that is identified through a possibly complex
>>>> contextualization process. In these cases, therefore, the
>>>> citation MUST be converted to a reference to a Work or an
>>>> Expression, which is resolved into the physical Locator of the
>>>> Item only when needed, therefore separating the legal aspects of
>>>> the identification of the correct version and variant of a
>>>> document from the technical aspects of the dereferencing of a
>>>> resource on the World Wide Web."
>>>>
>>>> So what info does our part of the spec need to convey?  Our
>>>> citations will be identifying cases/other court docs at the FRBR
>>>> Work level and at the FRBR Expression level, methinks. A "print"
>>>> citation would be on the Expression level, no?
>>>>
>>>> To follow up on the scope issue, if the aim (or one aim) is to
>>>> specify minimal data that can be derived from the text, for the
>>>> purpose of generating a key for submission to a resolver, would
>>>> this work for cases:
>>>>
>>>> type (decision) court (id) docket number (string) decision date
>>>> (date)
>>>>
>>>> For the "court" element, an ID would be preferable to the court
>>>> name, since the latter can change without any change to the
>>>> institution proper.
>>>>
>>>> Resolution would return further details (cites for each
>>>> reporting service carrying the case, with case name, etc.); the
>>>> suggestion above is only for the "handle" that uniquely
>>>> identifies the case.
>>>>
>>>> (Whether this makes any sense will depend on the scope of the
>>>> endeavor, of course.)
>>>>
>>>> Where does pinpoint (page or para numbers) fit into this? The
>>>> citation is to the work as a whole, but also (usually, or mostly)
>>>> to specific language in that work.
>>>
>>> The four features listed above would identify the Work. If all
>>> records on the resolver side contain those features, a specific
>>> Expression amendable to pinpointing could be obtained by adding a
>>> reporter key to the resolver call. The return from the resolver
>>> would include other details (volume number, page number or range,
>>> etc.), but the core details plus the reporter should be sufficient
>>> to identify the pinpoint-able record.
>>>
>>>>
>>>> What do you all think?
>>>>
>>>> -- John Quentin Heywood heywood@american.edu
>>>
>>> ---------------------------------------------------------------------
>>>
>>>
> To unsubscribe from this mail list, you must leave the OASIS TC that
> generates this mail.  Follow this link to all your TCs in OASIS at:
>>> https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php
>>>
>>
>>>
>> ---------------------------------------------------------------------
>>
>>
> To unsubscribe from this mail list, you must leave the OASIS TC that
>> generates this mail.  Follow this link to all your TCs in OASIS at:
>> https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php
>>
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe from this mail list, you must leave the OASIS TC that
> generates this mail.  Follow this link to all your TCs in OASIS at:
> https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php
>
References:
- Usecase--US Federal Courts draft
  - From: John Quentin Heywood <heywood@wcl.american.edu>
- Re: [legalcitem-courts] Usecase--US Federal Courts draft
  - From: Frank Bennett <biercenator@gmail.com>
- Re: [legalcitem-courts] Usecase--US Federal Courts draft
  - From: Frank Bennett <biercenator@gmail.com>
- Re: [legalcitem-courts] Usecase--US Federal Courts draft
  - From: John Quentin Heywood <heywood@wcl.american.edu>
- Re: [legalcitem-courts] Usecase--US Federal Courts draft
  - From: Frank Bennett <biercenator@gmail.com>
- RE: [legalcitem-courts] Usecase--US Federal Courts draft
  - From: "Hirsh, Kenneth (hirshkh)" <hirshkh@ucmail.uc.edu>
- Re: [legalcitem-courts] Usecase--US Federal Courts draft
  - From: Frank Bennett <biercenator@gmail.com>
- Re: [legalcitem-courts] Usecase--US Federal Courts draft
  - From: Brian Carver <brian.carver@documentengineeringservices.com>