Dear Monica and Fabio,
The purpose of my small document was to validate with
examples, the use of the attribute (originalId) when
renumbering or translation as it was proposed in a TC
meeting.
So, my sole intention was to see how to set this attribute,
with the following constraints
- when set, it will never change
- it is use as identifier for
renumbering and for language version so I can use it to
navigate to an old version or another language
- there is not always an
official master language
I also believe that, in fact, this
attribute is a Work id and the currentId an _expression_ id.
Last small comment : this discussion is
related to the structural elements and is based on the fact
that these elements exist in all language versions. But
- how to manage the fact that some
version (maybe the master version itself) does not
have the correspondant element ?
- How to manage the Work id for the
semantic inline elements like "<ref>" or
"<person>" in multilingual documents ?
All other comments are hereafter in
blue
Kind regards
Véronique
Véronique Parisse
AUBAY Luxembourg
Orco House
38, Parc d’activités - L-8308 Capellen
Standard : +352 2992501
Fax : +352 299251
www.aubay.com
________________________________________
De : legaldocml@lists.oasis-open.org
[legaldocml@lists.oasis-open.org] de la part de Fabio Vitali
[fabio@cs.unibo.it]
Envoyé : mardi 25 mars 2014 14:40
À : monica.palmirani
Cc : legaldocml@lists.oasis-open.org
Objet : Re: [legaldocml] Ids proposal - sometime they come
back... again!
Dear Monica,
I agree with you on most things.
> Proposal:
>
> 1) to use originalId with semantic naming convention and
to point out to a conventional language id (e.g. France,
English)
I agree. I'll also rephrase and extend:
1) originalIds definitely are understood as Work-level ids
(they could even be called workIds or wids).
2) currentIds definitely are understood as _expression_-level
ids (they could even be called expressionIds, exprIds, eids or
even simply ids).
3) All documents need to have _expression_-level ids. Period.
4) Whether an XML document has or hasn't work-level ids is NOT
a decision of the marker, but a characteristic of the nature
of the document. In fact, if an XML document does NOT have
work-level ids, then it is assumed that a) this is the Master
_expression_ (the one whose _expression_-level ids will be used as
a map for the work-level ids of all the other expressions) and
b) its work-level ids are the same as _expression_-level ids. If
this is NOT the Master _expression_, then the work-level ids
NEED to be present. Master Expressions are necessarily the
FIRST (or the ONLY) time-related versions of a document that
either is intrinsically MONOLINGUAL or is expressed in the
MASTER LANGUAGE, which is country- and jurisdiction- dependent
and may even not exist (as in EU). A marker must know whether
the document he/she is marking up is the Master _expression_ or
not for a Work.
5) _expression_-level ids use a semantic naming convention based
on the structures of their _expression_
6) Work-level ids use a semantic naming convention based on
the structures of their Master _expression_, if one exists, or
of a conceptual Ur-_expression_, if none exists.
- I totally agree for the change of the
attributes names.
- I totally agree with the algorithm at
the theoritical point of view. However, the build of a
concrete definition in case of no official master language
will be a big challenge for administration like EU
(definition of the common convention for 24 languages).
- this work well for structural markup
but not for semantic inline markup (like ref or term, ...).
For these elements, the correspondance between _expression_ is
not a trivial treatment ... if it exists.
- The
Master Expressions is not necessary the first time-related
version as we said that this information can be set although
the unavailability of the master linguistic document.
- this rule is time-based. In the case
of renumbering of a monolingual version and a later step, a
translation in another language, the "wId" will contain, in
all language versions, the old number of the renumbered
structure.
> 2) to use FRBRTranslation attribute "pivot" for
expressing in which language we have the master copy
We need another attribute as "pivot" is
already used in the case of translation with a pivot
language (for example, from maltese to spanish with an
intermediate translation in english)
7) A new element in the <FRBRExpression> section is
added, <masterExpression> or something like this.
8) The <masterExpression> element is optional and used
to record the URI of the Master _expression_ and the human
language in which the Work-Level ids are expressed.
9) If no <masterExpression> element is specified inside
<FRBRExpression>, then it is assumed that THIS
_expression_ is the Master _expression_.
10) If a <masterExpression> element exists, but has no
href attribute, then it is assumed that the masterExpression
does not really exist in reality, it is an UR-_expression_, and
only the human language used for the work-level Ids is
specified here.
> 4) to use renumbering metadata in <textMod> block
for tracking the renumbering sequences over time, instead of
inside of <temporalData> in order to have a unique
synchronized block where to write and to read information
about the renumbering.
Totally disagree. In fact, textual modification elements exist
within the following hierarchies:
meta -> analysis -> activeModifications -> [textual
modification]
meta -> analysis -> passiveModifications -> [textual
modification]
These elements are only used when modifications happen, and as
such they are basically only relevant for legislative texts
such as acts and bills. Yet, the issues to be decided today
also affect documents where no modification is happening, but
merely synchronization between multilingual versions such as
debate reports and such.
I believe therefore that textual modification elements are not
the right place where to create the full history of mappings
between work-level ids and _expression_-level ids of other
versions and variants.
I strongly believe that the right place is EITHER in a sibling
structure to <activeModifications> and
<passiveModifications>, OR (which is my preferred
solution) where they are now, in the temporalData block,
because this is what they really are: annotations about the
effect of temporal evolution of the document.
Ciao
Fabio
--
Il giorno 25/mar/2014, alle ore 03:50, monica.palmirani
<monica.palmirani@unibo.it> ha scritto:
> Dear colleagues,
>
> a new proposal concerning the ids, coming from Veronique,
was presented and discussed in the last unofficial TC on March
21.
> See the proposal in attachment.
>
> The proposal aims to cope with the problem of the
synchronization among different linguistic versions of the
same work document.
>
> The idea proposed is to use originalId in several
different manners according to the different situations:
> 1) <point currentId="art_6__par_2__list_1__pnt_στ"
> originalId="art_6__par_2__list_1__pnt_f">
> In case there is a master language version (in this case
the English). We use originalId for recording the Id of the
master language version.
>
> 2) <point currentId="art_6__par_2__list_1__pnt_στ"
originalId="2013-619191">
> In case there isn't a master language version, we use
originalId for recording a meaningless and opaque id.
>
> 3) <point currentId="art_6__par_2__list_1__pnt_ε"
> originalId="art_6__par_2__list_1__pnt_f">
> This is the case of renumbering in monolingual document.
We use originalId in a third manner for recording the original
position before the renumbering.
>
> 4) <point currentId="art_6__par_2__list_1__pnt_ε"
> originalId="2013-619191">
> This is the case of renumbering in Greek document. We use
originalId in a fourth manner for recording the original
position before the renumbering using opaque id.
>
> We have other several cases where originalId assumes
different semantics with different meanings.
>
> Three considerations and one proposal:
>
> a) it is not good to have originalId with different
purposes and different semantics, otherwise we need in the
metadata block some more elements that say us in which
semantic we are (e.g. FRBRTranslation or FRBRlanguage);
> b) it is not good to introduce opaque id in oridinalId,
mixed with different semantic naming convention, because it
breaks interoperability among different institutions that are
adopting different methodologies. Moreover the originalId in
opaque format produces a new problem: naming convention for
new provisions inserted (e.g. between 2013-619191 and
2013-619192, I need to insert new
art_6__par_2__list_1__pnt_e1, how I call the new originalId?
2013-619191-1?)
> c) the work level id is nice idea, but we have also the
scenario where I need to markup the third version, from the
scratch, without knows nothing about the first original XML
version because I have only PDF in my hand (e.g. very old
act);
> d) mark-up a renumbering modifications about a provision
(e.g. third unnumbered paragraph) and linguistic versions
mapping (e.g. with different numbering order and odd structure
not corresponding) are both legal intellectual activities, so
both of them are subjective interpretation and not objective
mark-up.
>
> Proposal:
>
> 1) to use originalId with semantic naming convention and
to point out to a conventional language id (e.g. France,
English)
> 2) to use FRBRTranslation attribute "pivot" for
expressing in which language we have the master copy
> 3) to add new attribute to FRBRTranslation "mapper" for
expressing in which language we have assumed the mapping (in
any case we need a new meta something for tracking the
different method used in originalId).
> 4) to use renumbering metadata in <textMod> block
for tracking the renumbering sequences over time, instead of
inside of <temporalData> in order to have a unique
synchronized block where to write and to read information
about the renumbering.
>
> With this proposal the cases appear as follow:
> 1) <point currentId="art_6__par_2__list_1__pnt_στ"
> originalId="art_6__par_2__list_1__pnt_f">
> the same but with in FRBRTranslation pivot="eng"
>
> 2) <point currentId="art_6__par_2__list_1__pnt_στ"
> originalId="art_6__par_2__list_1__pnt_f">
> the same but with in FRBRTranslation mapper="eng"
>
> 3) <point currentId="art_6__par_2__list_1__pnt_ε"
> originalId="art_6__par_2__list_1__pnt_f">
> renumbering in monolingual version with renumbering meta
data in <textMod>
>
> 4)<point currentId="art_6__par_2__list_1__pnt_ε"
> originalId="art_6__par_2__list_1__pnt_f">
> renumbering in multiple language versions with
renumbering meta data in <textMod> and with
FRBRTranslation mapper="eng" or FRBRTranslation pivot="eng",
depending to the case.
>
> I hope this find well most of you. See you in the TC
meeting at 28 March 1.30 EDT.
>
> Good night!
> Monica
> --
> ===================================
> Associate professor of Legal Informatics
> School of Law
> Alma Mater Studiorum Università di Bologna
> C.I.R.S.F.I.D.
> http://www.cirsfid.unibo.it/
>
> Palazzo Dal Monte Gaudenzi - Via Galliera, 3
> I - 40121 BOLOGNA (ITALY)
> Tel +39 051 277217
> Fax +39 051 260782
> E-mail
> monica.palmirani@unibo.it
>
> ====================================
>
>
>
<currentId-originalId-v1.pdf><currentId-originalId-v1.odt>
>
---------------------------------------------------------------------
> To unsubscribe from this mail list, you must leave the
OASIS TC that
> generates this mail. Follow this link to all your TCs in
OASIS at:
>
https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php
--
Fabio Vitali Tiger got to hunt,
bird got to fly,
Dept. of Computer Science Man got to sit and wonder
"Why, why, why?'
Univ. of Bologna ITALY Tiger got to sleep, bird
got to land,
phone: +39 051 2094872 Man got to tell himself
he understand.
e-mail: fabio@cs.unibo.it Kurt Vonnegut (1922-2007),
"Cat's cradle"
http://vitali.web.cs.unibo.it/
---------------------------------------------------------------------
To unsubscribe from this mail list, you must leave the OASIS
TC that
generates this mail. Follow this link to all your TCs in
OASIS at:
https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php