OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

legalcitem message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [legalcitem] Questions/remarks on legal reference conceptual model


Dear Thomas, 

my comments here are NOT as chair of the TC, but as a normal member of the TC with opinions and proposals. Feel free to disagree, object, mock them as you see fit. 

> Il giorno 29 mar 2016, alle ore 09:40, Thomas Francart <thomas.francart@sparna.fr> ha scritto:
> 
> Hello LegalCiteM
> 
> Reviewing what we covered during the last calls, I was asking myself some questions/remarks that I am sending out to the group. Sorry if some are obvious or were already answered. I would be happy if we could spend a few minutes in the next call to cover these.
> 
> Cheers
> 
> Thomas
> 
> 
> 
> 	• We covered examples that showed that tokenizing the citation and assigning tokens to features in the source-frame requires advanced knowledge of the citation; for example knowing that "2014" in "Pensions Act 2014" cannot be assigned to the year of the document, but is part of the title. Or turning "215 Va. 338" into an "official number" feature with [Va.] first, then [215], then [338].

First of all, the precise distinction between source frame and interpretation frame is inevitably complex. Even the boundaries of the citation might be fuzzy and subject of interpretation: is this a title of the document, or the heading of the specific section, or not even part of the citation text? Is this a page number or an article number? And so on. 

From my point of view, the discriminating characteristics of the source frame are not in the labels but in the values: if a value is in the source frame, then it is somewhere to be found in the citation *as it is reported*. Viceversa, if a value is in the source frame, then it must be present somewhere in the citation exactly as it is specified. As to *what it is described as*, i.e., the label, well that is per force an interpretation. 

The interpretation frame, on the other hand, contains values that are NOT explicitly mentioned in the citation, but that some actor (either automatic or human) has been able to derive with some degree of certainty because it/he/she knowns something about the cited document. 

> So, building the source frame requires already a lot of interpretation. And the same citation could lead to different source-frames depending on who/which software annotates it. LCMReferences on the wiki writes "the source frame represents in a machine-readable way the same information that is specified in the source" - it is actually not the same information, it adds quite a lot of information/structure.

It does not add data (=values). It certainly adds structure (i.e.: labels on the data). Not allowing this would mean giving up any analysis of the citation: everything would become interpretation frame. I believe we MUST provide a separate handling of those facts about a document that are specified in the citation from those facts that I know about the document because I am a local expert on this document. 

> 	• How will a reference data strucutre be associated/encoded within (XML) texts ? For example taking this XML excerpt from one of Catherine's example :
> <Citation id="c00023" Class="EuropeanUnionDirective" URI="http://www.legislation.gov.uk/european/directive/2001/0083"; Number="83" Year="2001">2001/83/<AcronymExpansion="European Community">EC</Acronym></Citation>

Goooooood question. My take: the LegalCiteM feature set that can be said about a textual citation is universal and standardized. The linearization of such feature set (i.e., how it is rendered in an XML, PDF, HTML or plain text file, etc.) is not. There ARE a number of linearization formats that could and hopefully will claim to be able to render fully or partially the LegalCiteM feature set. I will do my best to make sure that he Akoma Ntoso Naming Convention will assert such compliancy. My hope is that ELI will do it as well. 

> Assuming the (JSON ?) encoding of the citation was done (automatically or manually), where/how would it be associated with the corresponding piece of text ? is it out of scope of LegalCiteM ?

No, no. The JSON format is but ONE linearization, but by all means NOT the only one, or not even the main one. I believe that HTTP URIs will be the main and ubiquitous means to use LegalCiteM references. The JSON format should be the format whereto a linearization such as an ELI URI is converted, so that a further conversion into, say, ANNC, URN:LEX or Zotero representation is possible. 

> 	• What is the intended/expected usage for the JSON data-structure ? Who / which software will use it, and what are typical use-cases for it ? maybe documenting use-cases (if not done already) could be helpful.

Again, to me the JSON is one of many possible representations of an intermediate data structure to allow interoperability between different linearizations. I hoped this was clear from the text I wrote, but I'll try to be more explicit in a further version of the text. 

> 	• Is it in the scope of LegalCiteM to write specifications of a JSON-reference resolver ? (e..g a service that will take as an input a JSON reference, and return as an output a (list of ?) URL to actual documents)

No. I don't think we should bless any specific linearization. We COULD say that the JSON representation is complete, but it should neither be the only, nor the main linearization that supports it. So any specification we write will make references to "feature sets", not "representations of feature sets", of which JSON is an example. 

> 	• It was said that LegalCiteM should find an agreement on, and specify the (work-level) features of a citation. How will this specification be written ? in an ontology ?
> How will the specifications of features be related with specifications of descriptive metadata of legislations provided by ontologies like ELI (http://publications.europa.eu/mdr/resource/eli/eli-20141209-0/eli_ontology.xlsx) or by Akoma Ntoso schema ? is it related / overlapped or completely different ?

As before, I must compliment on the depth and sharpness of the question. I'll try to provide my view on this. 

First of all, WHAT are we agreeing on? Not an ontology, I do not think we are ready for an ontology yet. I would content myself with agreeing on a vocabulary of feature labels and on a common set of values for the simplest of them (say: countries, dates, languages, data formats). I would also be content if we agreed on some basic and fundamental ideas, such as the distinction between Work- Expression and Manifestation-layer features, and that grouping of such features happen in a controlled way (for instance, that country is but the first level of a hierarchy that identifies the jurisdiction, or that work creator and expression creator should be considered separately even in case of documents that only have ONE expression (so that their authors overlap). 

So a list of feature labels and how they behave with each others and (in some cases only) what values they must contain, is more than enough for me as output of the TC. Anything more than this is desirable but not strictly necessary. 

Second: Work-level features only? No. I believe we should also deal with Expression-level and Manifestation-level features as much as possible.

Third: express them as an ontology? If by ontology you mean an formal representation of orthogonal concepts with a clear differentiation between them, then yes, I think this is something we should pursue. If you mean a formal representation in OWL of the features, then I would say no. I do not think we should provide yet another formal representation of things that have been formally represented time and again in other ways. The ELI ontology, or the Akoma Ntoso non-ontology are good examples of things we should harmonize, rather than antagonize, through the LegalCiteM feature set characterization. 

So fundamentally I believe that the ideal outcome of this work should be a standard representation / labelling of features of a reference that allow for

a) easy linearization as JSON structures, XML fragments, URI references, RDF graphs, plain text strings (e.g., ECLI); 
b) easy conversion back and from any linearization without loss of information (or with known and well-understood loss of information);
c) easy representation of the information contained in the reference according to the concepts and classes expressed in any ontology without loss of information (or with known and well-understood loss of information).

I hope I was clear and convincing. 

I am awaiting objections, counterpoints, dissenting opinions and proposals, and rotten vegetables thrown at me. 

Ciao

Fabio

-- 

> 
> -- 
> 
> Thomas Francart - SPARNA
> Web de données | Architecture de l'information | Accès aux connaissances
> blog : blog.sparna.fr, site : sparna.fr, linkedin : fr.linkedin.com/in/thomasfrancart
> tel :  +33 (0)6.71.11.25.97, skype : francartthomas


--

Fabio Vitali                                          The sage and the fool
Dept. of Informatics                                     go to their graves
Univ. of Bologna  ITALY                               alike in this respect:
phone:  +39 051 2094872                  both believe the sage to be a fool.
e-mail: fabio@cs.unibo.it                  Where, then, may wisdom be found?
http://vitali.web.cs.unibo.it/   Qi, "Neither Yes nor No", The codeless code



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]