legalcitem-technical message

Subject: RE: [legalcitem-technical] ELI and Parseable URIs
From: "Tabone, Catherine" <Catherine.Tabone@nationalarchives.gsi.gov.uk>
To: Fabio Vitali <fvitali@gmail.com>
Date: Wed, 8 Jul 2015 13:23:05 +0100
Hi Fabio,

Comments on a couple of your points below before our meeting later:

Regarding 1 - Unfortunately the precise year associated with a regnal year is not well known (this is one of the reasons why citation by regnal year for pre 1963 Acts has persisted) and there is no one to one mapping between regnal year and calendar year in either direction i.e. a regnal year can span several calendar years, a calendar year can cover more than one regnal year. It's impossible to automatically map between the two for all cases without a look-up list of documents and their complete information. For the example Eliz2/4-5: chapters 1-22 are in 1955, chapters 23-76 are in 1956, Eliz2/5-6  contains chapters 1-5 in 1956. Looking at a citation example: given a reference to the famous "6 Ann. c. 42", commonly cited just that way, (Succession to the Crown Act 1707 ) and that the period 6 Ann. spans the years 1706 and 1707, how do I know that c. 42 is in 1707? The only way is to manually look it up. Helpfully it was given a short title that contains the year, by one of the Short Titles Acts in the 1890s. But not all legislation pre-1900 has a short title, and relying on the title (if present) to find a key element of the identifier is not very practical.


In the guide to best practice for ELI, we say that a good identifier scheme is one that can be derived directly from the common form/s of citation, with no additional information. In other words, I can take the string of the common form of citation "6 Ann. c. 42", widely found in law reports, judgments, footnotes etc. and parse it directly to obtain a correct legislation.gov.uk URI (www.legislation.gov.uk/id/Ann/6/41), without needing any additional information. In particular I don't need to know that 6 Ann c. 42 was in 1707, nor that 6 Ann spans 1706 and 1707.

By contrast, there is no way to parse the string "6 Ann. c. 42" to arrive at "1707/ann.6.42" without looking up 6 Ann c. 42 somewhere first, to find it was enacted in 1707. This is a major disadvantage that stems from demanding the Gregorian year, even when it plays no part in the common citation. As the legislation publisher we hold regnal/calendar year information so could potentially create these kind of identifiers (although there would be no advantage in doing this) it but it wouldn't be useable for anyone else trying to mark-up a citation according to the standard. It seems more sensible to create an identifier scheme that is flexible enough to express only the things that need to be included rather than forcing irrelevant (for identification) and unknown information to be included. I know it's a horrible numbering/citation system but we can't change it now - it was 1963 before they saw sense and stopped!

I'm a bit confused as to the reason for parsable identifiers is it:
a) that common forms of written citation can easily/directly be parsed to an identifier in the form of an http URI that can be resolved to the actual resource?
b) that the identifier itself can be parsed to obtain some basic information about the resource?

I think I was assuming a) in which case I only need the information to identify the document, I don't want to be tied to adding a lot of extra information that I may not have. From the ELI Task Force perspective a) is important, and b) is largely unnecessary / undesirable - metadata is provided in other much more digestible forms than extracting it from the URI.

Regarding 5 - In terms of the definitions of FRBR levels I would completely agree with your interpretation, indeed this is exactly how it's defined on www.legislation.gov.uk and it will be expressed in the same way using the ELI ontology when we fully implement this in the near future. However, the members of the ELI Task Force couldn't all agree to this interpretation. We spent many days arguing over this point trying to find a common interpretation. In the end it became clear that we couldn't agree on one fixed model so settled for providing a flexible model that accommodated all interpretations and left it to the publisher to determine the appropriate way to define their own artefacts.

Kind regards,

Catherine



-----Original Message-----
From: legalcitem-technical@lists.oasis-open.org [mailto:legalcitem-technical@lists.oasis-open.org] On Behalf Of Fabio Vitali
Sent: 06 July 2015 17:02
To: Tabone, Catherine
Cc: legalcitem-technical@lists.oasis-open.org
Subject: Re: [legalcitem-technical] ELI and Parseable URIs

Hello.

As promised, here you can find a few reflections on Catherine's mail regarding issues hat led ELI to ignore uniformity in their URIs and therefore prevent their parsability.

Let me summarize the main issues, as far as I understand them:

1 non-gregorian dates: some UK documents are numbered according to regnal years, e.g., Eliz2, rather than gregorian year.
2 unknown dates: some old UK documents have no easily identifiable date associated.
3 document number: some documents are not numbered and are known by some identifying string
4 nature of work date: the definition of the creation date in work varies from country to country and cannot be coalesced into a single concept.
5 Mapping of documents into FRBR levels: what is a work and what is an expression is a matter of dispute in ELI circles and no homogeneity could be found.

Let me start with item 3, as it is the easiest. In Akoma Ntoso this situation is well known. Although the document number is a frequent situation, it is by no mean universal and automatic. Some documents are not numbered. In this case, the AN Naming Convention asks for a "Number or title or other disambiguating feature of the Work (when appropriate, otherwise the string nn)". Any string works in this position, thus we should expect a number wherever a number exists, otherwise any string is acceptable provided that it follows some URI-specific simplification (e.g., spaces, punctuation etc.)

As for item 1, regnal years, I would like to posit the following facts: the identification of the actual gregorian year associated to a specific regnal year is possibly well known for subjects of that specific country, but in an international setting provides very little context. What year is Eliz2/4-5? Therefore, it seems to me that an understandable date needs to be provided for non-subjects, and that the regnal year becomes part of the disambiguating string, prepending the number. Thus, for instance, document with regnal year Eliz2/4-5 and number 18 could become 1955/eliz2.4-5.18 .

If no date is known (item 2), then we need to introduce the concept of unknown date (e.g., the arbitrary value 9999 or YYYY), and maintain the same conceptual structure with an invented value for the date.

As for item 4, I would provide a generic, rather than specific, definition of work date: a work date is the date (as precise as possible) in which the document started existing in that specific nature. This could very well differ depending on document type and document jurisdiction, as long as all actors involved agree for that specific document class. Thus, it could be associated to the publication date, to the promulgation date or to the approval date without loosing its characteristic of being the date in which the document starts existing.

As for item 5, I have VERY STRONG opinion on what constitutes a work and what an expression. Namely, there is no need to invoke superworks, there is no need to separate dated versions from language-based versions: a superwork is the only work, and the work is realized in expressions, which are related to each other according to time and language relations. The paths between expressions (A is the translation of B, C is the French version of D) are relations between Expressions, characterized by a difference in content that is mapped to the same idea of document anyway. The relation between a Work and its Expressions is different, therefore, and it is a pointless complication to introduce yet another layer.

Geometrically, if you want, the space of Expressions is bi-dimensional, and all paths of editing and translation happens on the plane, while the relation of the Work to the Expressions lays outside of the plane. (Too obscure?)

To summarize, it seems to me that the basic tenets of the Work level features of the Akoma Ntoso Namign Convention can still be applied here:

- Country or subdivision (a two-letter code according to ISO 3166-1 or a four-letter code according to ISO 3166-2).
- Type of document.
- Any specification of document subtype, if appropriate.
- The emanating actor, unless implicitly deducible by the document type (e.g., acts and bills do not usually require actor, while ministerial decrees do).
- Original creation date (expressed in YYYY-MM-DD format or just YYYY if the year is enough for identification purposes).
- Number or title or other disambiguating feature of the Work (when appropriate, otherwise the string nn).

Finally, I really like what you say here:

> The expectation is that the ELI URI will be supported directly by the legislation publisher's website. Any resolution will be carried out by them (i.e. the web server is carrying out any resolution).

I really like day after day the idea that ELI is the best way to provide an Item-level addressing scheme for documents that is the closest to map to a LegalCiteM URI reference. I would very much hope we could go forward from this distinction.

Ciao

Fabio

--






On 15/giu/2015, at 23:25, "Tabone, Catherine" <Catherine.Tabone@nationalarchives.gsi.gov.uk> wrote:

> Hi,
>
> As promised I'm sharing some of the issues (not exhaustive) that influenced the development of ELI identifiers as flexible rather than fixed and parseable.
>
> There are several issues that are key here so I've created a separate documents (attached). All information is on legislation as this is all that ELI is concerned with although I doubt there is any less complexity for other legal document types. Just to be clear I'm not necessarily suggesting that the eventual LegaLCiteM standard should follow the ELI pattern exactly but highlighting the issues to add to the information available for consideration in the creation of the citation standard.
>
> A couple of other things to note:
>
> ELI is not just an identifier nor is it intended as a citation/reference scheme. It's designed as a mechanism to make it easier for legislative publishers to exchange information. ELI consists of several things that are designed to be used together (although you don't have to implement everything at once) - a URI scheme, a model for organising legislation versions (based on FRBR) and a set of metadata to be added to legislation content (RDFa is the recommended encoding).
>
> When ELI was initially proposed the first draft of the identifier scheme was a lot more prescriptive. However when investigated it became clear that following this route would make it unlikely that anyone would implement it. The identifying features and the way the FRBR model needed to be implemented were very different in the various countries. There is a significant lack of resources to make major changes to national website as well as the practical barriers (other systems may depend on legislation websites) and political considerations (who gets to tell who how to build their website!) so having something flexible and light-weight was essential in order to get anyone to consider implementation.
>
> Concerning the issue of parseability, this wasn't really considered when the ELI identifier scheme was created. ELI is a URI scheme, it's not a parseable reference or a URN. The expectation is that the ELI URI will be supported directly by the legislation publisher's website. Any resolution will be carried out by them (i.e. the web server is carrying out any resolution). This is a reflection of the fact that the way most people find legislation is via a search engine, like Google, which will have indexed the actual webpage URL. This URL will be the reference that most people will use when citing online legislative sources.
>
> It's also worth noting that there is already a type of URN for legislation in use which is supported by many EU countries. Unfortunately this isn't generally used in legislation citations or referred to directly as the interaction is via the N-Lex connectors. Each country builds a connector (a SOAP service, which is based on URN:LEX) that the EU Publications Office interact with (services may be open to other users too) which allows use of common search terms in a general portal to access national legislation websites. For details see http://eur-lex.europa.eu/n-lex/index_en.htm .
>
> Let me know you need more explanation or if I've missed anything.
>
> Regards,
>
> Catherine
>
> Catherine Tabone
>
> Data Manager
> Legislation Services
> The National Archives, Kew, Richmond, Surrey TW9 4DU
>
> +44 (0)20 8876 3444 ext. 2233
> catherine.tabone@nationalarchives.gsi.gov.uk
> www.legislation.gov.uk | www.nationalarchives.gov.uk
>
>
>
> Please don't print this e-mail unless you really need to.
>
> ----------------------------------------------------------------------
> -------------
>
>
> National Archives Disclaimer
>
> This email and any files transmitted with it are intended solely for
> the use of the
> individual(s) to whom they are addressed. If you are not the intended
> recipient and have received this email in error, please notify the sender and delete the email.
> Opinions, conclusions and other information in this message and
> attachments that do not relate to the official business of The
> National Archives are neither given nor endorsed by it.
>
>
> ----------------------------------------------------------------------
> --------------
>
>
>
> <1-Identifying Features.docx><2-FRBR.docx>
> ---------------------------------------------------------------------
> To unsubscribe from this mail list, you must leave the OASIS TC that
> generates this mail.  Follow this link to all your TCs in OASIS at:
> https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php


--

Fabio Vitali                                          The sage and the fool
Dept. of Informatics                                     go to their graves
Univ. of Bologna  ITALY                               alike in this respect:
phone:  +39 051 2094872                  both believe the sage to be a fool.
e-mail: fabio@cs.unibo.it                  Where, then, may wisdom be found?
http://vitali.web.cs.unibo.it/   Qi, "Neither Yes nor No", The codeless code


---------------------------------------------------------------------
To unsubscribe from this mail list, you must leave the OASIS TC that generates this mail.  Follow this link to all your TCs in OASIS at:
https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php

This email was scanned by the Government Secure Intranet anti-virus service supplied by Vodafone in partnership with Symantec.  (CCTM Certificate Number 2009/09/0052.)  In case of problems, please call your organisations IT Helpdesk.
Communications via the GSi may be automatically logged, monitored and/or recorded for legal purposes.
Please don't print this e-mail unless you really need to.

-----------------------------------------------------------------------------------

 
National Archives Disclaimer
 
This email and any files transmitted with it are intended solely for the use of the 
individual(s) to whom they are addressed. If you are not the intended recipient and 
have received this email in error, please notify the sender and delete the email. 
Opinions, conclusions and other information in this message and attachments that do 
not relate to the official business of The National Archives are neither given nor 
endorsed by it.


------------------------------------------------------------------------------------