Re: [legaldocml] Ids proposal - sometime they come back... again!

Dear Monica and Fabio,

The purpose of my small document was to validate with examples, the use of the attribute (originalId) when renumbering or translation as it was proposed in a TC meeting.
So, my sole intention was to see how to set this attribute, with the following constraints

when set, it will never change
it is use as identifier for renumbering and for language version so I can use it to navigate to an old version or another language
there is not always an official master language

I also believe that, in fact, this attribute is a Work id and the currentId an _expression_ id.

Last small comment : this discussion is related to the structural elements and is based on the fact that these elements exist in all language versions. But

how to manage the fact that some version (maybe the master version itself) does not have the correspondant element ?
How to manage the Work id for the semantic inline elements like "<ref>" or "<person>" in multilingual documents ?

All other comments are hereafter in blue

Kind regards

Véronique

Véronique Parisse
AUBAY Luxembourg
Orco House
38, Parc d’activités - L-8308 Capellen
Standard : +352 2992501
Fax : +352 299251
www.aubay.com

________________________________________
De : legaldocml@lists.oasis-open.org [legaldocml@lists.oasis-open.org] de la part de Fabio Vitali [fabio@cs.unibo.it]
Envoyé : mardi 25 mars 2014 14:40
À : monica.palmirani
Cc : legaldocml@lists.oasis-open.org
Objet : Re: [legaldocml] Ids proposal - sometime they come back... again!

Dear Monica,

I agree with you on most things.

> Proposal:
>
> 1) to use originalId with semantic naming convention and to point out to a conventional language id (e.g. France, English)

I agree. I'll also rephrase and extend:

1) originalIds definitely are understood as Work-level ids (they could even be called workIds or wids).
2) currentIds definitely are understood as _expression_-level ids (they could even be called expressionIds, exprIds, eids or even simply ids).

3) All documents need to have _expression_-level ids. Period.
4) Whether an XML document has or hasn't work-level ids is NOT a decision of the marker, but a characteristic of the nature of the document. In fact, if an XML document does NOT have work-level ids, then it is assumed that a) this is the Master _expression_ (the one whose _expression_-level ids will be used as a map for the work-level ids of all the other expressions) and b) its work-level ids are the same as _expression_-level ids. If this is NOT the Master _expression_, then the work-level ids NEED to be present. Master Expressions are necessarily the FIRST (or the ONLY) time-related versions of a document that either is intrinsically MONOLINGUAL or is expressed in the MASTER LANGUAGE, which is country- and jurisdiction- dependent and may even not exist (as in EU). A marker must know whether the document he/she is marking up is the Master _expression_ or not for a Work.

5) _expression_-level ids use a semantic naming convention based on the structures of their _expression_
6) Work-level ids use a semantic naming convention based on the structures of their Master _expression_, if one exists, or of a conceptual Ur-_expression_, if none exists.

- I totally agree for the change of the attributes names.

- I totally agree with the algorithm at the theoritical point of view. However, the build of a concrete definition in case of no official master language will be a big challenge for administration like EU (definition of the common convention for 24 languages).

- this work well for structural markup but not for semantic inline markup (like ref or term, ...). For these elements, the correspondance between _expression_ is not a trivial treatment ... if it exists.

- The Master Expressions is not necessary the first time-related version as we said that this information can be set although the unavailability of the master linguistic document.

- this rule is time-based. In the case of renumbering of a monolingual version and a later step, a translation in another language, the "wId" will contain, in all language versions, the old number of the renumbered structure.

> 2) to use FRBRTranslation attribute "pivot" for expressing in which language we have the master copy

We need another attribute as "pivot" is already used in the case of translation with a pivot language (for example, from maltese to spanish with an intermediate translation in english)

7) A new element in the <FRBRExpression> section is added, <masterExpression> or something like this.
8) The <masterExpression> element is optional and used to record the URI of the Master _expression_ and the human language in which the Work-Level ids are expressed.
9) If no <masterExpression> element is specified inside <FRBRExpression>, then it is assumed that THIS _expression_ is the Master _expression_.
10) If a <masterExpression> element exists, but has no href attribute, then it is assumed that the masterExpression does not really exist in reality, it is an UR-_expression_, and only the human language used for the work-level Ids is specified here.

> 4) to use renumbering metadata in <textMod> block for tracking the renumbering sequences over time, instead of inside of <temporalData> in order to have a unique synchronized block where to write and to read information about the renumbering.

Totally disagree. In fact, textual modification elements exist within the following hierarchies:

meta -> analysis -> activeModifications -> [textual modification]
meta -> analysis -> passiveModifications -> [textual modification]

These elements are only used when modifications happen, and as such they are basically only relevant for legislative texts such as acts and bills. Yet, the issues to be decided today also affect documents where no modification is happening, but merely synchronization between multilingual versions such as debate reports and such.

I believe therefore that textual modification elements are not the right place where to create the full history of mappings between work-level ids and _expression_-level ids of other versions and variants.

I strongly believe that the right place is EITHER in a sibling structure to <activeModifications> and <passiveModifications>, OR (which is my preferred solution) where they are now, in the temporalData block, because this is what they really are: annotations about the effect of temporal evolution of the document.

Ciao

Fabio

--

Il giorno 25/mar/2014, alle ore 03:50, monica.palmirani <monica.palmirani@unibo.it> ha scritto:

> Dear colleagues,
>
> a new proposal concerning the ids, coming from Veronique, was presented and discussed in the last unofficial TC on March 21.
> See the proposal in attachment.
>
> The proposal aims to cope with the problem of the synchronization among different linguistic versions of the same work document.
>
> The idea proposed is to use originalId in several different manners according to the different situations:
> 1) <point currentId="art_6__par_2__list_1__pnt_στ"
> originalId="art_6__par_2__list_1__pnt_f">
> In case there is a master language version (in this case the English). We use originalId for recording the Id of the master language version.
>
> 2) <point currentId="art_6__par_2__list_1__pnt_στ" originalId="2013-619191">
> In case there isn't a master language version, we use originalId for recording a meaningless and opaque id.
>
> 3) <point currentId="art_6__par_2__list_1__pnt_ε"
> originalId="art_6__par_2__list_1__pnt_f">
> This is the case of renumbering in monolingual document. We use originalId in a third manner for recording the original position before the renumbering.
>
> 4) <point currentId="art_6__par_2__list_1__pnt_ε"
> originalId="2013-619191">
> This is the case of renumbering in Greek document. We use originalId in a fourth manner for recording the original position before the renumbering using opaque id.
>
> We have other several cases where originalId assumes different semantics with different meanings.
>
> Three considerations and one proposal:
>
> a) it is not good to have originalId with different purposes and different semantics, otherwise we need in the metadata block some more elements that say us in which semantic we are (e.g. FRBRTranslation or FRBRlanguage);
> b) it is not good to introduce opaque id in oridinalId, mixed with different semantic naming convention, because it breaks interoperability among different institutions that are adopting different methodologies. Moreover the originalId in opaque format produces a new problem: naming convention for new provisions inserted (e.g. between 2013-619191 and 2013-619192, I need to insert new art_6__par_2__list_1__pnt_e1, how I call the new originalId? 2013-619191-1?)
> c) the work level id is nice idea, but we have also the scenario where I need to markup the third version, from the scratch, without knows nothing about the first original XML version because I have only PDF in my hand (e.g. very old act);
> d) mark-up a renumbering modifications about a provision (e.g. third unnumbered paragraph) and linguistic versions mapping (e.g. with different numbering order and odd structure not corresponding) are both legal intellectual activities, so both of them are subjective interpretation and not objective mark-up.
>
> Proposal:
>
> 1) to use originalId with semantic naming convention and to point out to a conventional language id (e.g. France, English)
> 2) to use FRBRTranslation attribute "pivot" for expressing in which language we have the master copy
> 3) to add new attribute to FRBRTranslation "mapper" for expressing in which language we have assumed the mapping (in any case we need a new meta something for tracking the different method used in originalId).
> 4) to use renumbering metadata in <textMod> block for tracking the renumbering sequences over time, instead of inside of <temporalData> in order to have a unique synchronized block where to write and to read information about the renumbering.
>
> With this proposal the cases appear as follow:
> 1) <point currentId="art_6__par_2__list_1__pnt_στ"
> originalId="art_6__par_2__list_1__pnt_f">
> the same but with in FRBRTranslation pivot="eng"
>
> 2) <point currentId="art_6__par_2__list_1__pnt_στ"
> originalId="art_6__par_2__list_1__pnt_f">
> the same but with in FRBRTranslation mapper="eng"
>
> 3) <point currentId="art_6__par_2__list_1__pnt_ε"
> originalId="art_6__par_2__list_1__pnt_f">
> renumbering in monolingual version with renumbering meta data in <textMod>
>
> 4)<point currentId="art_6__par_2__list_1__pnt_ε"
> originalId="art_6__par_2__list_1__pnt_f">
> renumbering in multiple language versions with renumbering meta data in <textMod> and with FRBRTranslation mapper="eng" or FRBRTranslation pivot="eng", depending to the case.
>
> I hope this find well most of you. See you in the TC meeting at 28 March 1.30 EDT.
>
> Good night!
> Monica
> --
> ===================================
> Associate professor of Legal Informatics
> School of Law
> Alma Mater Studiorum Università di Bologna
> C.I.R.S.F.I.D.
> http://www.cirsfid.unibo.it/
>
> Palazzo Dal Monte Gaudenzi - Via Galliera, 3
> I - 40121 BOLOGNA (ITALY)
> Tel +39 051 277217
> Fax +39 051 260782
> E-mail
> monica.palmirani@unibo.it
>
> ====================================
>
>
> <currentId-originalId-v1.pdf><currentId-originalId-v1.odt>
> ---------------------------------------------------------------------
> To unsubscribe from this mail list, you must leave the OASIS TC that
> generates this mail. Follow this link to all your TCs in OASIS at:
> https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php

--

Fabio Vitali                            Tiger got to hunt, bird got to fly,
Dept. of Computer Science        Man got to sit and wonder "Why, why, why?'
Univ. of Bologna ITALY               Tiger got to sleep, bird got to land,
phone: +39 051 2094872              Man got to tell himself he understand.
e-mail: fabio@cs.unibo.it         Kurt Vonnegut (1922-2007), "Cat's cradle"
http://vitali.web.cs.unibo.it/

---------------------------------------------------------------------
To unsubscribe from this mail list, you must leave the OASIS TC that
generates this mail. Follow this link to all your TCs in OASIS at:
https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php

legaldocml message