legaldocml message

Subject: Again on ids and numbering

From: Fabio Vitali <fabio@cs.unibo.it>
To: "legaldocml@lists.oasis-open.org" <legaldocml@lists.oasis-open.org>
Date: Sun, 9 Feb 2014 15:07:10 +0100

Dear all,

after this week's informal discussion, I would like to make an amended proposal for the management of ids in Akoma Ntoso. I hope I got all the suggestions in the appropriate place, Let me know if I forgot anything.

The generic syntax for an id is the following:

[prefix "__"] element_ref ["_" num]

* prefix is a (possibly empty) string providing uniqueness to the remaining part of the id, and based on the context in which the element appears.

Prefix
------
The context of an element is the element that suggest, imply or force a re-start of the numbering of all internal or subsequent elements of the same name. Different contexts imply that elements with the same name may end up having the same element_ref and the same num, and must therefore be disambiguated through the use of a prefix. Such prefix is the id of the context element. For instance, in many traditions chapters' numbering restarts within every title, so "chp_2" for Chapter 2 could be ambiguous. In these cases the id for Chapter 2 of Title I will be "title_I__chp_2" (assuming that "title_I" is the whole id for Title I. Elements that are globally unique or globally numbered within a document require no prefix (in the hypothesis of a single document XML file).

* All document classes (act, bill, doc, etc.) are ALWAYS contexts. This means that, except particular cases, all numbers restart whenever a new document class is started (e.g., in a composite document each document component has its own local numbering).
* Elements <quotedStructure> and <embeddedStructure> are always contexts, EVEN IF they do not force a **restart** of the numbering, but just a different numbering context within themselves.
* Plain inline elements are NEVER contexts. Exception: element <mod> is ALWAYS a context.

Element_ref
-----------
element_ref is a reference to the identified element; this is always the name of the element, except for a brief list of well-known abbreviations as in the following table:

FOR ELEMENT x USE ABBREVIATION y
*** TBD *** *** TBD ***

num
---

num is a (possibly empty) representation of the numbering of the element within its context.

Globally and locally unique elements: if the element is necessarily unique within its context, no numbering is used. This means that the id of elements that are necessarily unique within a given context will have no num part. For instance, since there is exactly ONE <body> in acts and bills, its id can be simply "body" (or "doc_1__body" in case of a composite document, of course). Analogously, since there is at most ONE <content> element inside articles or sections, the id of the <content> element of article 12 will be simply "art_12__content".

Explicitly numbered elements: an explicitly numbered element has its number determined in the expression itself in the form of a <num> subelement. The num part of the ids of such elements corresponds to the stripping of all punctuation, separating as well as redundant characters in the content of the <num> element. The representation is case-sensitive. For instance, if article 12 contains <num>Art. 12 bis</num> then the num part of the id will be "12bis";

It is the job of the author of the manifestation to determine whether the numbering expressed in the <num> element is global (i.e., it starts at 1 at the beginning of the document class) or local (i.e., it restarts at 1 inside or after every instance of an intermediate element). This is usually made clear within every legal tradition and usually can be established by briefly examining a few or even just one document in its original form.

Implicitly numbered elements: an implicitly numbered element has no <num> sub-element, and its numbering is established by counting the occurrences of similar elements within the same context, necessarily using arabic numbers.

It is the job of the author of the manifestation to determine whether the best way to count these elements is globally (i.e., starting at 1 at the beginning of the document class) or locally (i.e., restarting at 1 inside or after every instance of an intermediate element). This naming convention provides no rules on this choice, but there are a few common sense approaches. For instance, it is very natural that <eop> elements are globally counted, and <eol> are locally counted by their preceding <eop> element, and as such, the third <eop> element (separating the third page from the fourth) has id "eop_3" (note no prefix), while the fifteenth end of line after this <eop> will have as id "eop_3__eol_15". On the other hand, <p> elements within a given structure are reasonably counted locally (as in "third p of section 12"). This is not necessarily the immediately containing element (which in this case would be the <content> element), but any containing or preceding element that in the opinion of the author of the manifestation provides context for the counting. Thus the third p of section 12 would have "sect_12__p_3" as its id.

Abundant or incomplete references
---------------------------------
An abundant reference is a reference, in particular the fragment part of an IRI, that contains more information than needed to match it to the id of an element. An incomplete reference, on the other hand, contains less information than necessary and therefore may point to more than one possible destinations. BTW, we must never deal with abundant or incomplete ***id*** in the id attributes of elements, since ids are created by the author of a manifestation, and therefore we should expect him/her to know what is needed to establish the minimum complete set of information to create an unambiguous id. We should only deal with abundant or incomplete references, since the author of a reference could not know everything about the document being mentioned in the text of the reference., and therefore he/she might create an incorrect reference that has too much or too little information.

In case of abundant reference, the resolver should identify the relevant minimal id (if it exists) by removing prefixes until a perfect match is found; in case of missing reference, on the other hand, the resolver must establish an interactive session with the user similar to the process of resolving work-level IRIs, and determine the missing information necessary to identify the id of an unique element.

let me know what you think.

Ciao

Fabio

Fabio Vitali Tiger got to hunt, bird got to fly,
Dept. of Computer Science Man got to sit and wonder "Why, why, why?'
Univ. of Bologna ITALY Tiger got to sleep, bird got to land,
phone: +39 051 2094872 Man got to tell himself he understand.
e-mail: fabio@cs.unibo.it Kurt Vonnegut (1922-2007), "Cat's cradle"
http://vitali.web.cs.unibo.it/