legaldocml message

Subject: Re: [legaldocml] Ids proposal - sometime they come back... again!
From: Fabio Vitali <fabio@cs.unibo.it>
To: monica.palmirani <monica.palmirani@unibo.it>
Date: Tue, 25 Mar 2014 14:40:09 +0100
Dear Monica, 

I agree with you on most things. 

> Proposal: 
> 
> 1) to use originalId with semantic naming convention and to point out to a conventional language id (e.g. France, English)

I agree. I'll also rephrase and extend: 

1) originalIds definitely are understood as Work-level ids (they could even be called workIds or wids). 
2) currentIds definitely are understood as Expression-level ids (they could even be called expressionIds, exprIds, eids or even simply ids). 

3) All documents need to have expression-level ids. Period. 
4) Whether an XML document has or hasn't work-level ids is NOT a decision of the marker, but a characteristic of the nature of the document. In fact, if an XML document does NOT have work-level ids, then it is assumed that a) this is the Master Expression (the one whose expression-level ids will be used as a map for the work-level ids of all the other expressions) and b) its work-level ids are the same as expression-level ids. If this is NOT the Master Expression, then the work-level ids NEED to be present. Master Expressions are necessarily the FIRST (or the ONLY) time-related versions of a document that either is intrinsically MONOLINGUAL or is expressed in the MASTER LANGUAGE, which is country- and jurisdiction- dependent and may even not exist (as in EU). A marker must know whether the document he/she is marking up is the Master Expression or not for a Work. 

5) Expression-level ids use a semantic naming convention based on the structures of their expression
6) Work-level ids use a semantic naming convention based on the structures of their Master Expression, if one exists, or of a conceptual Ur-Expression, if none exists. 

> 2) to use  FRBRTranslation attribute "pivot" for expressing in which language we have the master copy

7) A new element in the <FRBRExpression> section is added, <masterExpression> or something like this.
8) The <masterExpression> element is optional and used to record the URI of the Master Expression and the human language in which the Work-Level ids are expressed.
9) If no <masterExpression> element is specified inside <FRBRExpression>, then it is assumed that THIS expression is the Master Expression. 
10) If a <masterExpression> element exists, but has no href attribute, then it is assumed that the masterExpression does not really exist in reality, it is an UR-expression, and only the human language used for the work-level Ids is specified here. 

> 4) to use renumbering metadata in <textMod> block for tracking the renumbering sequences over time, instead of inside of <temporalData> in order to have a unique synchronized block where to write and to read information about the renumbering.

Totally disagree. In fact, textual modification elements exist within the following hierarchies: 

meta -> analysis -> activeModifications -> [textual modification]
meta -> analysis -> passiveModifications -> [textual modification]

These elements are only used when modifications happen, and as such they are basically only relevant for legislative texts such as acts and bills. Yet, the issues to be decided today also affect documents where no modification is happening, but merely synchronization between multilingual versions such as debate reports and such. 

I believe therefore that textual modification elements are not the right place where to create the full history of mappings between work-level ids and expression-level ids of other versions and variants. 

I strongly believe that the right place is EITHER in a sibling structure to <activeModifications> and <passiveModifications>, OR (which is my preferred solution) where they are now, in the temporalData block, because this is what they really are: annotations about the effect of temporal evolution of the document. 

Ciao

Fabio

--



Il giorno 25/mar/2014, alle ore 03:50, monica.palmirani <monica.palmirani@unibo.it> ha scritto:

> Dear colleagues,
> 
> a new proposal concerning the ids, coming from Veronique, was presented and discussed in the last unofficial TC on March 21.
> See the proposal in attachment.
> 
> The proposal aims to cope with the problem of the synchronization among different linguistic versions of the same work document.
> 
> The idea proposed is to use originalId in several different manners according to the different situations:
> 1) <point currentId="art_6__par_2__list_1__pnt_στ"
> originalId="art_6__par_2__list_1__pnt_f">
> In case there is a master language version (in this case the English). We use originalId for recording the Id of the master language version.
> 
> 2) <point currentId="art_6__par_2__list_1__pnt_στ" originalId="2013-619191">
> In case there isn't a master language version, we use originalId for recording a meaningless and opaque id. 
> 
> 3) <point currentId="art_6__par_2__list_1__pnt_ε"
> originalId="art_6__par_2__list_1__pnt_f">
> This is the case of renumbering in monolingual document. We use originalId in a third manner for recording the original position before the renumbering.
> 
> 4) <point currentId="art_6__par_2__list_1__pnt_ε"
> originalId="2013-619191">
> This is the case of renumbering in Greek document. We use originalId in a fourth manner for recording the original position before the renumbering using opaque id.
> 
> We have other several cases where originalId assumes different semantics with different meanings.
> 
> Three considerations and one proposal:
> 
> a) it is not good to have originalId with different purposes and different semantics, otherwise we need in the metadata block some more elements that say us in which semantic we are (e.g. FRBRTranslation or FRBRlanguage);
> b) it is not good to introduce opaque id in oridinalId, mixed with different semantic naming convention, because it breaks interoperability among different institutions that are adopting different methodologies. Moreover the originalId in opaque format produces a new problem: naming convention for new provisions inserted (e.g. between 2013-619191 and 2013-619192, I need to insert new art_6__par_2__list_1__pnt_e1, how I call the new originalId?  2013-619191-1?)
> c) the work level id is nice idea, but we have also the scenario where I need to markup the third version, from the scratch, without knows nothing about the first original XML version because I have only PDF in my hand (e.g. very old act);
> d) mark-up a renumbering modifications about a provision (e.g. third unnumbered paragraph) and linguistic versions mapping (e.g. with different numbering order and odd structure not corresponding) are both legal intellectual activities, so both of them are subjective interpretation and not objective mark-up.
> 
> Proposal: 
> 
> 1) to use originalId with semantic naming convention and to point out to a conventional language id (e.g. France, English)
> 2) to use  FRBRTranslation attribute "pivot" for expressing in which language we have the master copy
> 3) to add new attribute to FRBRTranslation "mapper" for expressing in which language we have assumed the mapping (in any case we need a new meta something for tracking the different method used in originalId).
> 4) to use renumbering metadata in <textMod> block for tracking the renumbering sequences over time, instead of inside of <temporalData> in order to have a unique synchronized block where to write and to read information about the renumbering.
> 
> With this proposal the cases appear as follow:
> 1) <point currentId="art_6__par_2__list_1__pnt_στ"
> originalId="art_6__par_2__list_1__pnt_f">
> the same but with in FRBRTranslation pivot="eng"
> 
> 2) <point currentId="art_6__par_2__list_1__pnt_στ"
> originalId="art_6__par_2__list_1__pnt_f">
> the same but with in FRBRTranslation mapper="eng"
> 
> 3) <point currentId="art_6__par_2__list_1__pnt_ε"
> originalId="art_6__par_2__list_1__pnt_f">
> renumbering in monolingual version with renumbering meta data in <textMod>
> 
> 4)<point currentId="art_6__par_2__list_1__pnt_ε"
> originalId="art_6__par_2__list_1__pnt_f">
> renumbering in multiple language versions with renumbering meta data in <textMod> and with FRBRTranslation mapper="eng" or FRBRTranslation pivot="eng", depending to the case.
> 
> I hope this find well most of you. See you in the TC meeting at 28 March 1.30 EDT.
> 
> Good night!
> Monica
> -- 
> ===================================
> Associate professor of Legal Informatics 
> School of Law
> Alma Mater Studiorum Università di Bologna 
> C.I.R.S.F.I.D. 
> http://www.cirsfid.unibo.it/
>  
> Palazzo Dal Monte Gaudenzi - Via Galliera, 3 
> I - 40121 BOLOGNA (ITALY) 
> Tel +39 051 277217 
> Fax +39 051 260782 
> E-mail  
> monica.palmirani@unibo.it
>  
> ====================================
> 
> 
> <currentId-originalId-v1.pdf><currentId-originalId-v1.odt>
> ---------------------------------------------------------------------
> To unsubscribe from this mail list, you must leave the OASIS TC that 
> generates this mail.  Follow this link to all your TCs in OASIS at:
> https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php



--

Fabio Vitali                            Tiger got to hunt, bird got to fly,
Dept. of Computer Science        Man got to sit and wonder "Why, why, why?'
Univ. of Bologna  ITALY               Tiger got to sleep, bird got to land,
phone:  +39 051 2094872              Man got to tell himself he understand.
e-mail: fabio@cs.unibo.it         Kurt Vonnegut (1922-2007), "Cat's cradle"
http://vitali.web.cs.unibo.it/
Follow-Ups:
- RE:[legaldocml] Ids proposal - sometime they come back... again!
  - From: PARISSE, Véronique <V.PARISSE@aubay.lu>
References:
- Ids proposal - sometime they come back... again!
  - From: monica.palmirani <monica.palmirani@unibo.it>