Re: Comments on Reference/Feature modeling

Dear All

I am forwarding to the group this email exchange we have with Fabio regarding the modeling of the LegalCiteM references. It tracks the conclusions of some discussion points.

I am also attaching the diagram we used to discuss the modeling. It is outdated, but necessary for the discussion below to be understood precisely.

All the best

Thomas

2016-12-12 8:27 GMT+01:00 Fabio Vitali <fvitali@gmail.com>:

Dear Thomas,

here are some reflections on what you wrote.

> Il giorno 08 dic 2016, alle ore 18:01, Thomas Francart <thomas.francart@sparna.fr> ha scritto:
>
> Fabio
>
> As promised, here are my comments on the LCM modeling. I actually tried to came up with a new version of the model diagram, attached in the powerpoint. I wrote down below the details of what I did, and we can discuss this firther during our next phone call.
>
> All the best
> Thomas
>
> • Use concepts/instances to represent the FRBR levels instead of the classes frbr:Work, frbr:_expression_ and frbr:Manifestation;
> • Using the classes as individuals makes the ontology not OWL-DL compliant. In OWL-DL a class cannot be used as an individual (see https://www.w3.org/TR/owl-ref/#OWLDL). I think it would be safer to declare these levels as skos:Concepts, or individual of a custom class;
> • Besides, semantically speaking this is strange because "frbr:Work" is really an identifier for "the set of all Works"; so saying "feature X has a level that is the set of all Works" is not exactly what we want to say here.

Since OWL 2, a new feature called "punning" has been introduced, that allows to use classes as individual:

"OWL 2 DL relaxes this separation somewhat to allow different uses of the same term, e.g., Eagle, to be used for both a class, the class of all Eagles, and an individual, the individual representing the species Eagle belonging to the (meta)class of all plant and animal species. [...] The OWL 2 Direct Semantics treats the different uses of the same name as completely separate, as is required in DL reasoners.", from https://www.w3.org/TR/owl2-new-features/#F12:_Punning.

HOWEVER the current specification of frbr:Work in http://purl.org/vocab/frbr/core is not incompatible with having it become an instance of skos:Concept exactly thanks to punning, and although I am in general not in favor of over-specifications in ontologies, I am not in general contrary to this addition, if it would be a necessity for your acceptance.

So my proposal: keep frbr:Work as the value for hasFRBRlevel, add the specification that frbr:Work, frbr:_expression_, frbr:Manifestation and frbr:Item are instances of skos:Concept.

OK, we stick with using references to FRBR classes and use punning.

> • Introduce the notion of "FeatureSpecification".
> • Definition "A FeatureSpecification declares caracteristics shared by multiple Features in References : a code/id for such features, meant to identify the feature type unambiguously, a set of labels potentially multilingual, and the FRBR level to which such features are related, in a citation context."
> • FeatureSpecification can be declared "on the fly" for each feature, or created a priori and shared across many features;
> • The FeatureSpecification declares to which FRBR level such features are linked. Potentially, we can make this delcaration optional and let individual features also declare the level to which they are linked;

Uhm. In general, I am in favor of all mechanisms that reduce and simplify the corresponding JSON. This proposal seems to increase the amount of stuff and parenthesis, while the reason behind it seems hard to explain.

I played a little with the idea of subclassing Feature to allow for the definition of "well-known" features, such as "document number", which is better in my ind than introducing another property linking the feature to a specification indirectly providing the additional properties.

Yet, the only property that could possibly be subsumed by this specification would be hasFRBRLevel, just ONE property, which seems a bit overkill for all this infrastructure. Moreover, while most properties are appropriate for a specific FRBR level, some others do not really belong to one level, but can appear in multiple levels. For instance, "document creator" is appropriate (with different values and nuances) at the Work, _expression_, and Manifestation level at the same time.

A feature will have an object property "hasFeatureName" (or "hasName", etc. exact property to be decided) pointing to a FeatureName object. The FeatureName object can have multilingual labels, a URI, and optionnaly a FRBR level when the feature always refer to the same level.

It will be possible to declare well-known FeatureName in a context and reference them directly as String values in the JSON-LD.

As such, I don't know, but I do not see the real necessity for this.

> • Suggest renaming "inFrame" instead of "hasFrame";

No problem here.

> • Introduce the class "FeatureValue".
> • Definition : "A FeatureValue represents the value of a Feature in a Reference. It is a complex object composed of : a set of equivalent string representation for the value, an optional reference to a more generic value, an optional reference to an URI that identifies this value".

> • Avoids relying on http://purl.org/co
> • Is is OK to rely on external ontologies like http://purl.org/co since they are not standard ? it would mean LegalCiteM and OASIS implicitely endorses it;
> • Define a more precise semantic on what a feature value is;

I am not against this class. But there is more complexity to deal with. We need to deal with sequences of progressively specific strings defining a hierarchy, and whose strings can have equivalent values:

us -> us-ca -> LAC
United States of America -> California -> Los Angeles County

So we need to understand what actually is this feature value: is it the individual string "us", or the set of equivalent values [ "us", "United States of America"] or the sequence of hierarchical values [["us", "United States of America"], ["us-ca", "California"] , ["LAC", "Los Angeles County"]]

The justification for using the Collections Ontology ( http://purl.org/co ) is to make sure that we have ordered lists while remaining within Owl-DL. I believe having lists is important, as it is the only way to allow for sequences of progressively specific strings defining a hierarchy, but I am not married to CO, of course (although I know fairly well one of the authors). There is a stylistic aspect to this, of course: should our model reinvent the wheel or adopt aspects of well-known ontologies?

My opinion on this aspect (define a new wheel and assert equivalence to existing wheels) has been traditionally a loosing proposition. I am fine with adopting other approaches. I would on the other hand maintain potential compatibility with CO, not less because many legal ontologies use it and/or use bibliographic ontologies that use it.

There is this idea of using a first mandatory context without any reference to an external ontology (including collections ontology), and additionnal optional context(s) can be added to declare equivalences with other external ontologies. There will be no references to the CO ontology from the mandatory context.

Discussion item to be refined with this idea of multiple contexts.

> • Open Issues/questions :
> • We need to be careful about the array structure in JSON-LD, which do not correspond to RDF lists, but simply to multiple values for the same property (see JSON-LD specification, chapter 6.11) :
> "A JSON-LD author can express multiple values in a compact way by using [arrays](file:///home/thomas/sparna/01-Formations/Supports/JSON-LD/JSON-LD%201.0.html#dfn-array). Since graphs do not describe ordering for links between nodes, arrays in JSON-LD do not provide an ordering of the contained elements by default. This is exactly the opposite from regular JSON arrays, which are ordered by default."

My fault. I kept on using for LCM syntax 5 the same @context of LCM syntax 6, forgetting that I did not define the @container as @list.

I fixed the @context now for syntax 5. Have a look at http://www.fabiovitali.it/legalcitem/lcm-context5.json, you will see that the list is indeed a list now, as per the specification of http://json-ld.org/spec/latest/json-ld/#sets-and-lists

OK.

> • I think we need to distinguish between the actual text value written in the reference, from other alternative values. Much like annotations in the "Open Annotation" ontology.

Agreed. Indeed, VERY MUCH AGREED! Totally. Will make a proposal soon.

If it is a source feature, most probably we only have one value.

If I want to say that "Stat." (written in the text) has an equivalent value of "Statuses" (not written), then I need to create a feature in the Interpretation frame with the same name as the one in the source frame, and I am expressing the equivalent value here. See also example on the "complex reference 1", where with the order of the subdivisions is different in source and interpretation frame.

> • This triggers another potentially distinction : distinguish between SourceFeature and InterpretationFeature, as 2 subclasses of Features, instead os using an "hasFrame" property which can have only 2 properties.
> • "SourceFeature" would be defined as "Features that have a FeatureValue that have a value for the property "text"" (or whatever we want to call the property indicating the actual text)

Ehm. Yes and no. Even many InterpretationFeatures have a value for the property "text", therefore this is not the distinguishing characteristics. The only distinguishing characteristics is that for SourceFeatures the value of the "text" property is contained somewhere in the citation, while for InterpretationFeature it has been provided by the author of the Reference.

Therefore there is nothing in the other properties of Feature that allows you to specify whether a feature is a SourceFeature or an InterpretationFeature, which means that we need a specific property, e.g., inFrame. Once we have the inFrame property, I agree that it is possible to define the subclasses according to its values.

> • In this point of view, a "SourceFeature" would be much like "highlighting a piece of text in the reference", close to a text annotation from the OpenAnnotation ontology;

Well, I don't know. We can certainly explore this model.

> • I would like to explore further how our Features relate to Annotations, since what we are doing by expressing Feature on References is essentially annotating the reference (either by annotating the actual text of the reference - source frame - or adding other annotations not written in the text - interpretation frame)

Correct.

Ciao

Fabio

--

>
>
>
>
> --
>
> Thomas Francart - SPARNA
> Web de données | Architecture de l'information | Accès aux connaissances
> blog : blog.sparna.fr, site : sparna.fr, linkedin : fr.linkedin.com/in/thomasfrancart
> tel : +33 (0)6.71.11.25.97, skype : francartthomas
> <feature-proposal.pptx>

--

Fabio Vitali The sage and the fool
Dept. of Informatics go to their graves
Univ. of Bologna ITALY alike in this respect:
phone: +39 051 2094872 both believe the sage to be a fool.
e-mail: fabio@cs.unibo.it Where, then, may wisdom be found?
http://vitali.web.cs.unibo.it/ Qi, "Neither Yes nor No", The codeless code

Thomas Francart - SPARNA
Web de données | Architecture de l'information | Accès aux connaissances
blog : blog.sparna.fr, site : sparna.fr, linkedin : fr.linkedin.com/in/thomasfrancart
tel : +33 (0)6.71.11.25.97, skype : francartthomas

Attachment: feature-proposal.pptx
Description: application/vnd.openxmlformats-officedocument.presentationml.presentation

legalcitem message