RE:[legaldocml] [legaldocml] URGENT: Summary of Naming Convention

Hello Fabio,

My comments in green

Kind regards

Véronique

Véronique Parisse
AUBAY Luxembourg
Orco House
38, Parc d’activités - L-8308 Capellen
Standard : +352 2992501
Fax : +352 299251
www.aubay.com

________________________________________
De : Fabio Vitali [fvitali@gmail.com]
Envoyé : dimanche 22 novembre 2015 22:53
À : PARISSE, Véronique
Cc : monica.palmirani; legaldocml@lists.oasis-open.org
Objet : Re: [legaldocml] [legaldocml] URGENT: Summary of Naming Convention - D3

On 20/nov/2015, at 15:21, PARISSE, Véronique <V.PARISSE@aubay.lu> wrote:

> Hello Fabio,
>
> My comment in blue hereafter.
>
> Please note that the current use of the separator is ok for me so I don't make more comment on it.
>
>
> Kind regards
>
> Véronique
>
>
> Véronique Parisse
> AUBAY Luxembourg
> Orco House
> 38, Parc d’activités - L-8308 Capellen
> Standard : +352 2992501
> Fax : +352 299251
> www.aubay.com
>
> ________________________________________
> De : Fabio Vitali [fvitali@gmail.com]
> Envoyé : vendredi 20 novembre 2015 10:32
> À : PARISSE, Véronique
> Cc : monica.palmirani; legaldocml@lists.oasis-open.org
> Objet : Re: [legaldocml] [legaldocml] URGENT: Summary of Naming Convention - D3
>
> Dear Veronique, all:
>
> On 12/nov/2015, at 10:47, "PARISSE, Véronique" <V.PARISSE@aubay.lu> wrote:
>
> > Hello Monica and everybody
> >
> > see my comment hereafter
> >
> > But before, I have something that I want to add in the specification (but we don't have the time to discussed it yesterday) :
> >
> > Akoma Ntoso documents are divided into containers : <preface>, <preamble>, the main content of the document, <conclusions> and <attachments>.
> >
> > The <preface>, <preamble> <attachments> and <conclusions> elements are available in all types of documents.
> > But the "main content of a document" is marked by different kind of elements, depending on the type of the akoma ntoso documents (the content are differently structured). For single document (so, not a document collection), we have: <body>, <mainBody>, <amendmentBody>, <debateBody>, <judgmentBody>.
> >
> > I propose that, for all these elements (<body>, <mainBody>, <amendmentBody>, <debateBody>, <judgmentBody>), the Akn naming convention provide a common prefix (for example, « body »), so that we can use it for selecting the whole body of an act
> > ([http://www.authority.org]/akn/sl/act/2004-02-13/2~body
>
> Yes, I agree, but I believe that these elements already can be identified using their id, there is no need to specify it separately. Just use their id.
>
> Yes, I agree, and I propose to use the prefix "body" to compose the eId of <body>, <mainBody>, <amendmentBody>, <debateBody>, <judgmentBody>.

Yes I think it is a good idea.

>> >> /akn/eu/act/2015/123/fr@2015/eupub!main/annex_1/annex_5~art_12 - complex
>> >> situation with components and optional metadata of the _expression_ (eupub)
>> > also
>> > /akn/eu/bill/2015/123/fr@ver_final!annex_1/annex_5~art_12 (version "final" of the bill for example when a proposal for an act is adopted by the Commission)
>>
>> Please do NOT use ver_ before the name of the version. It is NOT necessary.
>>
> I think that it is necessary because at the same location, you can have the stage, that it also a string. So it is impossible to differentiate the version from the stage and. The "ver_" add a better lisibility on the reference, so please, I would like to maintain it.

Stage and version are thing specific of certain document types. If you have both version names (as opposed to version dates) and stage names, and there is no simple way to distinguish between stage and version names then by all means distinguish them in any way you feel appropriate, including adding a "ver_" in front. I do NOT believe this should be specified in the standard, that's all.

I think that it is important, if the version is used for the iri, that it is marked by a prefix and the slash suffix : Version can be very complex structure and is not only text. For example, for the European Parliament, the version has the following look : "02-00" while for the Commission, it can be "final/2" (replace "/" with "-" for the reference... ;-) )

>> > Another point
>> > For the fragment/portion, we explain the mapping between the structure and the name used in the reference ("art" for article, "chp" for chapter, name of the element in general, how is make the numbering, ...)
>>
>> The fragment part MUST be the id of the element. Exactly the id.
>>
> The eId is depending on XML constraint (so, for example, whether you put or not the content of the subdocument. That is an implementation, so a Manifestation information). The author of the reference has not necesseraly the knowledge of this information
> He build the reference on the logical structure.
>
> For example,
> <documentCollection>
> <collectionBody>
>     <act>
>     </act>
>     <doc>
>     </doc>
> </collectionBody>
> </documentCollection>
> The eId inside the act and the doc must be unique inside the documentCollection;
>
> that is not the case with the following manifestation :
> <documentCollection>
>     <act>
>     </act>
>     <doc>
>     </doc>
> </collectionBody>
> </documentCollection>
> <component>
>    <act>
>    </act>
> </component>
> <component>
>    <doc>
>    </doc>
> </component>
>
> When you make a reference to a work or an _expression_, it is the job of the resolver to find the correct manifestation fragment. This can include, eventually, a mapping of the fragment/portion because the author of the legal reference has nothing to do we the xml stuff.

I totally disagree. Ids must be completely independent of the internal organization of the manifestation. In the situations that you mention, the ids MUST be the same. If you decide to place the components at the end of the collectionBody, rather than in-flow, then the ids must match anyway and must be the ones that you would have used if the components were in-flow.

There are other situations in which you may have this problem. For instance, if you have a document collection X that contains documents A and B, and in document C you have a reference to art_12 of document A. He/she may write it as either a reference to X#doc_a__art_12 (or X~doc_a__art_12), or to A#art_12 (or A~art_12). They are NOT the same reference. The first one is a reference pointing to a specific location inside a composite document. The second is a reference to a specific location inside an individual document, which the RESOLVER (and ONLY the resolver) can map to a component within a composite document.

In both cases the author of the reference does NOT know the physical organization of the manifestations, and it is the job of the resolver to take care of these situations.

Logical ids are completely independant of xml. Implementation of id in xml is depending on the xml constraints. I agree that for reference, it is the logical ids that is used. And it is the work of the resolver to map it to the implementation.

>> > For the component, we need also to specify where we find the information (which part of metadata correspond to the name of the component (annex, appendix, schedule, ... : is it the name attribute of the akn document element ? is it the metadata subType, other ?),
>>
>> We already have this information: it is the content of the FRBRthis element
>>
>> <xsd:element name="FRBRthis" type="valueType">
>> <xsd:annotation><xsd:documentation>
>> <type>Element</type>
>> <name>FRBRthis</name>
>> <comment>The element FRBRthis is the metadata property containing the IRI of the specific component of the document in the respective level of the FRBR hierarchy</comment>
>> </xsd:documentation></xsd:annotation>
>> </xsd:element>
>>
> Yes, but again, there is a need to specify how the IRI of a work that is annexed to another work is specified : for the main document, the specification is "the element name, a subtype, the author, ..."
> And for the uri of a document that is annexed ?

I do not understand. The IRI of an annex is given by the IRI of the containing document plus a string identifying the annex in the component fragment (after the !), and this string can be determined (with some leeway) from the annex name. This is the IRI that you specify in the FRBRthis element, which is different from the IRI that you specify in the FRBRuri, which is "the IRI of the whole document in the respective level of the FRBR hierarchy ". What is not clear here?

I think that it is more clean if we specify how to find the "name" used, like for the iri of the act, we specify a subtype in metadata even if the type is also somewhere in the title.

> If we do the same algorithm for the IRI of a document annexed, it will never be taken into account for the reference as we reference itrelatively to the main document as something that is annexed.
> (<iri-main-document>![fragment-name]_1)

Again: if you make a reference to a document as an annex of another document, it is a DIFFERENT reference than one to the document alone, EVEN IF they end up being the same document. The RESOLVER, and ONLY the resolver, can know this.

So, the FRBRWork/identification/FRBRthis for the annex can be specify as if it is alone ?

> - regarding the ![fragment-name]_1 part of the reference, where is the corresponding information in the metadata of the document annexed : how to know if [fragment-name] is "fragment" or "appendice" or ... ?
> (because the reference can be <iri-main-document>!annex_1 or <iri-main-document>!appendix_1 or <iri-main-document>!schedule_1)
>
> for me, "annex"-"appendix"-"schedule" can be
> - the attribute "name" of the document element or
> - the attribute type (or subtype) in the identification/work part of the metadata.

I do not think that we should standardize this. The name of the attachment is heavily dependent on the local tradition. It may be the <docTitle> of the contained document, it may be the <heading> of the <attachment> element of the containing document, it may be the name attribute of the document type, etc. It depends. There is NO RIGHT WAY.

as the annex is an akoma ntoso document, it has its own metadata, so we can use these metadata (for example subtype or name) to store the choice done, exactly like we do for the act.

Local traditions shall imposed which to use.

>> > We need also to explain how to design a component that has no component name (for example, documents that are attached and not included, like the agreement attached to a decision act).
>>
>> Whether a component is physically attached or not to its main, is a Manifestation-level aspect of the document, not a Work-level or _expression_-level characteristic: at the Work or _expression_ level, the secondary component IS ALWAYS attached to the primary.
>>
>> Therefore this information is only relevant for Manifestation-level references (i.e., src attributes) and not for Work-level and _expression_-level references (i.e., href attributes).
>>
> So, we reference an agreement that is attached to the act that adopts it, as "<iri-adoptionAct>!annex_1" and the annex 1 of the agreement as "<iri-adoptionAct>!annex_1/annex_1" ?

Possibly. Or even better: the agreement that is attached to the act that adopts it is referenced as "<iri-adoptionAct>!agreement" and the annex 1 of the agreement as "<iri-adoptionAct>!agreement/annex_1", because "agreement" ends up being the traditional name for that annex. It is a locally dependent decision.

And how do we do the reference to a consolided amendment, that is, the amendment that contains as part of the body, a complete act ?

>> > Finally, we need also to provide some examples for documentCollection (where you have two type of component : one inside the main body and one inside the attachment).
>>
>> Similarly, this problem is only relevant at the Manifestation level. This means that it is the business of the URI resolver to identify the physical entity where the requested component is placed if a Work-level or _expression_-level reference is used.
>>
> Then I don't understand the meaning of the identification/FRBRwork and identification/FRBRexpression part of metadata and why the spec says that the IRI of the manifestation is build on the IRI of the _expression_ that is build on the IRI of the work

Because a document that is contained in a composite work is NOT the same work that it would be as an individual document. It may have an ALIAS, but if you reference to a composite document, you are making a reference to the composite document, not to its individual components.

For me, it is the same work else, you constraints logical structure on implementation stuff that is not necessary known by the author of the reference. It is like a book of shakespeare is one work, independently whether it is in a composite book of all the work of shakespeare or not.

And, of course, ALL expressions and manifestations exist of specific works.

>> > PENDING ISSUES sub-fragments:
>> >
>> > To clarify my position
>> >
>> > I think that it is dommage that the "/" separator is used for two type of information in the structure of the IRI
>> > - separate different metadata (for example, in "eu/act", the slash separate different type of information ( the country and the type of the act).
>> > - mark the hierarchy (for example, annex_1/appendix_3/attachment_2 ...)
>>
>> To clarify MY position: we need separators. The main separator is the slash "/", but in order to allow optional parameters to appear in the middle of the URI without ambiguity, we need to place different separators strategically here and there. Currently we use FOUR other separators: "@" (or ":") for the version/variant/view, "." for the format, and recently "~" for the fragment and (unnecessarily, but alas...) "!" for the component. We do not need more than these, I think.
>>
>> > On another point, we have another syntax for marking a hierarchy : "__" So why can we not do this :
>> > /akn/eu/act/2015/123/fr@2015/eupub!annex_1__annex_5~art_12__para_1.xml
>>
>> This is a different aspect: the content of the fragment part of the URI (the thing after the "~") MUST BE EXACTLY LIKE the id of the element specified. Not an interpretation, not a conversion: exactly the id. This is important.
>>
> This is impossible when you make reference relative to a work or an _expression_, I think. The exact content of id is purely related to xml constraint. You cannot build a legal reference with taken into account the XML unicity, because a reference to a work or an _expression_ can in fact reference different manifestation, depending on the time it is activated.
> So it is a business of the resolver to make the mapping between the fragment specification of the reference and the eId value in some manifestation(s)

Veronique, I think we discussed this issue again in the past and agin we came to the same conclusions: this is the reason we built the ids on the apparent structure of the document, rather than some arbitrary string of the markup: because this is the way you create ids that do NOT depend on the Manifestation, but have some basis on the conceptual structure of the document rather than the physical structure created by the XML. The XML has an explicit structure which makes manifest (hence 'Manifestation') the implicit structure of the _expression_. Whenever the explicit and implicit structure match (hopefully, always), the Manifestation-level ids match the _expression_-level ids.

>> So, separately, we decided to use "__" to mark the separation between context and element name of ids because all other characters were possible in element numbers. Please note that technically, "__" is a separation between context and element name, not a hierarchical structure: the hierarchicality of the context is a consequence of the fact that some ids are built by juxtaposing the ids of their containing elements to create a full context. "art_12" is a full and complete id without its context, since "tome_3__book_5__chp_1__art_12" is clearly NOT necessary.
>>
>> > Alternatively, if you consider that "__" is not enough esthetic, then we can continue to use the "/" meltingpot, but EVERYWHERE, so
>> > /akn/eu/act/2015/123/fr@2015/eupub!annex_1/annex_5~art_12/para_23/list_2/point_9.xml#art_12/para_23/list_2/point_9
>>
>> As I said, I believe that the id of the element MUST BE TAKEN AS IT IS, without considerations as its hierarchical level or anything. The id is a single STRING, not a sequence of parts separated by "__" separators.
>>
> I agree on the the current syntax

Good. Monica, I think we can go on on this aspect.

>> > Consider that it is the syntax of referencing but it can never be considered as the "exact value" of the eId attribute, because, as it was clearly explained yesterday, the "eId" value is completely dependant on the organisation of the manifestation and whether you physically group or not the component manifestations in the same xml document (in fact, eId is constained by the XML syntax and rules on the unicity of the attribute) but of course, the author of a citation has not to take this into account.
>> > So, the naming convention of the reference is more generic and cannot be build on the real organisation of the manifestations.
>> >
>> > Much more, the naming convention of the reference must, for my point of view, be considered as a generic syntax allowing to find information in documents independently of its format: I can use the akn naming convention to reference fragment/portion in pdf or word document (in fact, I do it for referencing portion/fragment in a Formex document that has a completely different syntax for the naming of the text structure. So, I establish a mapping between these two syntaxes).
>>
>> I disagree. As soon as we enter into the inner-document referencing, then necessarily the organization and structure of the manifestation must come into play. We can (and might) assume that, if you are using an Akoma Ntoso naming convention syntax for the document, then you must assume to use the same syntax for the fragments, which are based on the assumption that you are using the XML vocabulary for the content. You may NOT use the XML syntax, but the ids you are using implicitly assume that the inner document references are based on what the XML syntax of the document would be.
>>
> It is impossible to specify a work or an _expression_ reference that take into account the Manifestation structure : a legal reference dont specify a specific manifestation and so, does not know how the manifestation is done.
> A reference must also remain valid even if there is a redefinition of the structure of the manifestations.
> This is the reason to transmit the fragment/portion to the server. This is the job of the resolver to make the mapping between the specification of the fragment / portion in a reference and the eId of the corresponding element, taken into account the structure of the Manifestation selected.

I agree... partially: the existence of the wId and of the <mapping> element in the metadata are meant exactly to allow fragment ids update themselves if renumbering has taken place in some previous versions. But this is not relevant for the discussion, since the fragment identifier CAN be transmitted to the server, so let's go on.

It is also important for the translation.

Veronique, please let us come to an agreement soon.

Ciao

Fabio

--

--

Fabio Vitali                                          The sage and the fool
Dept. of Informatics                                     go to their graves
Univ. of Bologna ITALY                               alike in this respect:
phone: +39 051 2094872                  both believe the sage to be a fool.
e-mail: fabio@cs.unibo.it                  Where, then, may wisdom be found?
http://vitali.web.cs.unibo.it/   Qi, "Neither Yes nor No", The codeless code

legaldocml message