legalcitem-courts message

Subject: Re: [legalcitem-courts] Usecase--US Federal Courts draft
From: Brian Carver <brian.carver@documentengineeringservices.com>
To: legalcitem-courts@lists.oasis-open.org
Date: Mon, 15 Dec 2014 15:59:06 -0800
I still have the same question I last asked on the list, whether the
machine-readable markup syntax we will create will enable people to
indicate exactly what the citation looked like in some text they may be
trying to faithfully represent or whether it will only provide a way to
refer to some sort of standardized citation.

I think it would be unfortunate if the syntax we propose were so
impoverished that it could only do the latter.

Instead, we might have a syntax that embraced both. We could introduce
the notion of providing a "literal" markup of precisely what appeared in
the text being represented and also providing markup for the component
parts of a citation such that any standardized citation style, such as
the Bluebook, could be generated.

So, when I'm creating a machine-readable citation of a text that
contains this mess:

Ingle v. Landis Tool Co. (C.C.A.) 272 F. 464

I could both represent what is literally there and what a modern
standardized citation would look like. This might take the form:

<span class="citation">
 <span class="literal">Ingle v. Landis Tool Co. (C.C.A.) 272 F. 464</span>
 <a href="/opinion/061374/ingle-v-landis-tool-co">
 <span class="casename">Ingle v. Landis Tool Co.</span>
 <span class="volume">272</span>
 <span class="reporter_abbr">F.</span>
 <span class="page">464</span>
 <span class="court">3rd Cir.</span>
 <span class="date">1921</span>
 </a>
</span>

From this information, if I wanted to generate something standardized to
BB format, namely:

Ingle v. Landis Tool Co., 272 F. 464 (3rd Cir. 1921).

then I could do that, but if I wanted to faithfully represent the
original text, I could also do that by taking what's in "literal" and
making that my hyperlink text instead.

It seems to me we ought to end up where something like this is possible,
otherwise I think a lot of people are not going to want to use a markup
standard that forces them to misrepresent the text they are reproducing.

That is, I think our organizing call insisted that we would not touch
the text on the page. Instead the markup in the background should be
smart enough to tell us what was on the page AND how a modern person
might cite that resource according to various standard approaches.

Brian

On 12/15/2014 02:45 PM, Frank Bennett wrote:
> On Mon, Dec 15, 2014 at 11:29 PM, Hirsh, Kenneth (hirshkh) 
> <hirshkh@ucmail.uc.edu> wrote:
>> I'd like to caution against going "too far into the weeds" in this
>> effort to identify nearly all conceivable sources for court
>> opinions. While I greatly appreciate Frank's excellent work on this
>> so far and I think that his work can stand alone as an inventory of
>> U.S. Court opinion sources, I question whether this detail is
>> needed to construct a model citation format.
> 
> To be clear, the data behind the U.S. portion of 
> http://fbennett.github.io/legal-resource-registry/ was not compiled
> by me; it is derived from existing work by Mike Lissner, done for
> the purpose of parsing citations out of legacy documents. The Web
> view just provides a concrete view of the relations currently
> expressed in printed citations. Treat it as a reference that
> illustrates some of the issues  that a specification for
> machine-readable citations would need to address.
> 
>> I also want to raise the fact that the technical subcommittee's
>> statement, including its declaration that we adhere to FRBR, has
>> not been adopted by the full TC. There are reasons that a
>> bibliographic record could contain far more information than what
>> is needed to find a court opinion, or for that matter, any other
>> document within the scope of the TC.
> 
> The FRBR terminology is still useful for thinking about what a 
> machine-readable citation expresses; and we need to know that in
> order to construct a specification for it.
> 
>> We are not building libraries, but rather are trying to provide a
>> relatively simple and consistent way for citations to be expressed
>> in both a machine-readable and human-readable format.
> 
> Absolutely. One point to settle is whether the 
> citation-to-be-specified identifies the text of a judgment as issued 
> by a court, or the text available in a specific reporting service.
> If we can nail that down, other things will follow from the choice.
> 
> Frank
> 
>> Ken
>> 
>> Kenneth J. Hirsh Director of the Law Library and I.T. Professor of
>> Practice University of Cincinnati College of Law ken.hirsh@uc.edu 
>> (513) 556-0159
>> 
>> -----Original Message----- From:
>> legalcitem-courts@lists.oasis-open.org
>> [mailto:legalcitem-courts@lists.oasis-open.org] On Behalf Of Frank
>> Bennett Sent: Friday, December 12, 2014 7:07 PM To: John Heywood 
>> Cc: legalcitem-courts@lists.oasis-open.org Subject: Re:
>> [legalcitem-courts] Usecase--US Federal Courts draft
>> 
>> On Sat, Dec 13, 2014 at 6:12 AM, John Quentin Heywood
>> <heywood@wcl.american.edu> wrote:
>>> Frank responded to my Wednesday email with some really useful 
>>> thoughts. I'm interested in hearing what others on the SC think
>>> about it all. For convenience, I'm replying to both emails from
>>> Frank in one response, so I moved the content around a bit.
>>> 
>>> On 12/11/2014 04:27 PM, Frank Bennett wrote:
>>> 
>>> On Thu, Dec 11, 2014 at 7:31 AM, Frank Bennett
>>> <biercenator@gmail.com> wrote:
>>> 
>>> John,
>>> 
>>> Looks like a good start!
>>> 
>>> This raises some questions about scope, I think. (The questions 
>>> themselves are at the end.)
>>> 
>>> Printed citation forms identify the resource, but involve a step
>>> of interpretation for several of the elements. We would need to
>>> know those should be cast in the electronic representation of
>>> the reference. Taking the first example ...
>>> 
>>> (1) case name string Is there a formula or a set of constraints
>>> for deriving this from the header information in a judgment? If
>>> it must be uniform across all citations to the case, it should
>>> either be possible to derive it programmatically, or there should
>>> be a canonical version of the case name somewhere that can be
>>> acquired via a resolver, using other elements that uniquely
>>> identify the case (I guess that's the middle layer in FRBR).
>>> 
>>> In US practice,  Bluebook rule 10.2 is what usually governs. The
>>> AALL Universal Citation Guide, 3d, in its rule 101, says case
>>> names should conform to it, or to ALWD Manual rule 12.2. The
>>> Univ. of Chicago's Maroon Book's rule 4.2 is similar, only MUCH
>>> simpler. In short, there is a formula. I like the idea of a
>>> resolver. Sort of like an authority record in cataloging.
>>> 
>>> (2) volume number This would be an integer for this category of
>>> citation. Is it safe to specify it as an integer, or are there
>>> exceptions that would require more flexibility?
>>> 
>>> Not always an integer. Sometimes reporter volumes are issued in
>>> parts, e.g., 245A, 245B, 245C, etc. This usually happens in the
>>> NRS when volumes are still in prep, and temporary paperback
>>> volumes are released piecemeal.
>>> 
>>> (3) reporter abbreviation As the LRR shows, there is a lot of
>>> variation in reporter abbreviations used in the wild (spacing,
>>> punctuation, abbreviations). If it is used as an element in an
>>> electronic representation of the reference, the abbreviation will
>>> need to be consistent across all references. How is it to be
>>> derived? The choice would seem to be between a canonical list of
>>> reporters and corresponding abbreviations, or the full name of
>>> the reporter. A secondary consideration would be whether the
>>> elements embedded in a reporter abbreviation (journal + series)
>>> should be broken out and represented separately.
>>> 
>>> I think a canonical list of full reporter names and
>>> abbreviations would be the way to go. I don't think it necessary
>>> to break out the series....treat them as separate entries.
>>> 
>>> (4) first page number This seems an integer. Same question about
>>> constraints as for volume number.
>>> 
>>> In US practice, I have never seen a non-integer page number for a
>>> case (roman numbers for intro parts of reporter, but not for
>>> cases). I think we could type it as an integer.
>>> 
>>> (5) pinpoint page numbers Pinpoints can include references to
>>> page numbers, note numbers, and possibly other document elements.
>>> Should these elements be specified, or is a dumb string
>>> sufficient?
>>> 
>>> In neutral citations, the pinpoint is usually to a paragraph
>>> number, not a page.
>>> 
>>> (6) circuit justice if applicable This raises a question of
>>> whether the spec is aimed at full description of the resource, or
>>> at pinning down the essential information needed to unambiguously
>>> identify the resource. If the latter, this would not be needed.
>>> 
>>> That is a really good question. I had been assuming full
>>> description, but your comment is making me rethink that.
>>> 
>>> (7) year of decision In this citation form, are the year of
>>> decision and the year of publication always aligned?
>>> 
>>> The year of decision is the year of publication by the court by
>>> definition. The "publication" date is not the date the reporter
>>> was published. An interesting question is how to deal with
>>> changes made by the court after the decision is published, but
>>> before the print official reporter hits the streets. (SCOTUS is
>>> infamous for this : 
>>> http://www.nytimes.com/2014/05/25/us/final-word-on-us-law-isnt-supreme
>>>
>>> 
-court-keeps-editing.html
>>> ) I suppose that type of info can go in the parenthetical string
>>> at the end of the citation.
>>> 
>>> (8) parenthetical information such as judge, type of document,
>>> weight of authority This raises the same question as (6).
>>> 
>>> See my comments under 6 & 7 above.
>>> 
>>> (9) the court SCOTUS citations are to a dedicated reporter, so
>>> the court is implicit. Should this be made explicit in an
>>> electronic citation? Alternatively, should reporters be made a
>>> separate domain in the specification, so that such information
>>> can be attached to each?
>>> 
>>> Again with the great question....I think it should be explicit.
>>> It seems to me that all court citations should be similar, to
>>> make parsing easier among other reasons.
>>> 
>>> ***
>>> 
>>> I guess a threshold question is the scope of the spec:
>>> 
>>> (a) Does it aim to express the elements of all existing printed 
>>> citations (this is also Brian's question, I think); or (b) Does
>>> it aim to specify only the elements of all printed citations
>>> needed to uniquely identify the resource; or (c) Does it aim to
>>> specific only the minimum elements (or combination of elements)
>>> needed to uniquely identify the resource?
>>> 
>>> If the aim is the enrichment of document content with RDF-style
>>> links to meaningful text elements, that suggests (a).
>>> 
>>> If the aim is to support parsers capable to linking specifically
>>> to cases, that suggests (b) -- this is the aim of the
>>> CourtListener database from which the LRR is derived.
>>> 
>>> If the aim is to provide guidance for the construction of
>>> resolvers and data to feed to them, that suggests (c). My
>>> understanding is that this is what we're aiming for, but I could
>>> be wrong.
>>> 
>>> Frank
>>> 
>>> This is the crux. I had been assuming (a) or maybe (b). So I went
>>> to the TechSC's latest draft and reread it:
>>> 
>>> " It is NOT the purpose of this TC to establish a proposed syntax
>>> for citations."
>>> 
>>> " The relevant task of every subcommittee is therefore to
>>> identify types and roles of FRBR entities in their document
>>> classes, and classify features according to different levels of a
>>> layered model of the document."
>>> 
>>> "Also, subcommittees should also identify how the references to 
>>> documents of their classes are impacted by the layered view of 
>>> documents.... It will be a rare case indeed the citation (and 
>>> therefore the need for a reference) pointing to an FRBR Item
>>> (i.e., to a specific file on a specific computer at a specific IP
>>> address) or to an FRBR Manifestation (i.e., to a specific
>>> characterization in a specific file format of a document). Most
>>> frequently a citation points to a legal document existing on a
>>> different conceptual layer and in a different level of reality
>>> than the physical copies it is embodied by, or by the data
>>> formats in which each copy is expressed. More frequently,
>>> therefore the citation will identify a document at a more 
>>> abstract level, e.g., an FRBR Expression when the citation is to
>>> a specific version or variant of the document, or an FRBR Work
>>> when the citation is to all these versions or variants, or to the
>>> one that is identified through a possibly complex
>>> contextualization process. In these cases, therefore, the
>>> citation MUST be converted to a reference to a Work or an
>>> Expression, which is resolved into the physical Locator of the
>>> Item only when needed, therefore separating the legal aspects of
>>> the identification of the correct version and variant of a
>>> document from the technical aspects of the dereferencing of a
>>> resource on the World Wide Web."
>>> 
>>> So what info does our part of the spec need to convey?  Our
>>> citations will be identifying cases/other court docs at the FRBR
>>> Work level and at the FRBR Expression level, methinks. A "print"
>>> citation would be on the Expression level, no?
>>> 
>>> To follow up on the scope issue, if the aim (or one aim) is to
>>> specify minimal data that can be derived from the text, for the
>>> purpose of generating a key for submission to a resolver, would
>>> this work for cases:
>>> 
>>> type (decision) court (id) docket number (string) decision date
>>> (date)
>>> 
>>> For the "court" element, an ID would be preferable to the court
>>> name, since the latter can change without any change to the
>>> institution proper.
>>> 
>>> Resolution would return further details (cites for each
>>> reporting service carrying the case, with case name, etc.); the
>>> suggestion above is only for the "handle" that uniquely
>>> identifies the case.
>>> 
>>> (Whether this makes any sense will depend on the scope of the 
>>> endeavor, of course.)
>>> 
>>> Where does pinpoint (page or para numbers) fit into this? The
>>> citation is to the work as a whole, but also (usually, or mostly)
>>> to specific language in that work.
>> 
>> The four features listed above would identify the Work. If all
>> records on the resolver side contain those features, a specific
>> Expression amendable to pinpointing could be obtained by adding a
>> reporter key to the resolver call. The return from the resolver
>> would include other details (volume number, page number or range,
>> etc.), but the core details plus the reporter should be sufficient
>> to identify the pinpoint-able record.
>> 
>>> 
>>> What do you all think?
>>> 
>>> -- John Quentin Heywood heywood@american.edu
>> 
>> ---------------------------------------------------------------------
>>
>> 
To unsubscribe from this mail list, you must leave the OASIS TC that
generates this mail.  Follow this link to all your TCs in OASIS at:
>> https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php
>>
>
>> 
> ---------------------------------------------------------------------
>
> 
To unsubscribe from this mail list, you must leave the OASIS TC that
> generates this mail.  Follow this link to all your TCs in OASIS at: 
> https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php
> 
>
Follow-Ups:
- Re: [legalcitem-courts] Usecase--US Federal Courts draft
  - From: Frank Bennett <biercenator@gmail.com>
References:
- Usecase--US Federal Courts draft
  - From: John Quentin Heywood <heywood@wcl.american.edu>
- Re: [legalcitem-courts] Usecase--US Federal Courts draft
  - From: Frank Bennett <biercenator@gmail.com>
- Re: [legalcitem-courts] Usecase--US Federal Courts draft
  - From: Frank Bennett <biercenator@gmail.com>
- Re: [legalcitem-courts] Usecase--US Federal Courts draft
  - From: John Quentin Heywood <heywood@wcl.american.edu>
- Re: [legalcitem-courts] Usecase--US Federal Courts draft
  - From: Frank Bennett <biercenator@gmail.com>
- RE: [legalcitem-courts] Usecase--US Federal Courts draft
  - From: "Hirsh, Kenneth (hirshkh)" <hirshkh@ucmail.uc.edu>
- Re: [legalcitem-courts] Usecase--US Federal Courts draft
  - From: Frank Bennett <biercenator@gmail.com>