RE: [legaldocml] On ids and numbering

https://docs.google.com/document/d/1rGvVnvg4PaWD_ql2QjK1nqfq7WtPUJlCnOV0dUoDU9E/pub

First, I am very excited about the potential for using the id within legal documents. This will make the electronic citation possibilities stronger and easier to integrate with other document types. For example, having opinion papers, journal articles and other documents that refer to multiple fields can be better structured and useful.

However, there are some issues that seem apparent to me with the current recommendation. I would start out with what I believe are the requirements for a well constructed protocol for assigning id attribute values.

I realize that id attribute values in use are often globally unique values within a greater set of documents or, are semantic in quality and/or endeavor to match an XPath like value. I see value in any of these methods of determining the id values. For legal documents, going with the XPATH like value has great utility and logic. Also the recommendation includes a semantic like system in that it mirrors the semantically named elements.

My suggestion for dealing with the element name and hcontainer attribute value potential collisions:

Generating an XSL auto generator or end user parser of the id values will be more difficult or possible without dealing with the two issues I listed.

I would find acceptable any id value system that could be autogenerated using XSL for most jurisdictions. That might allow for end user tools as well. I do not believe that the current recommendation will allow for this. Perhaps each jurisdiction will need their own XSL generator, but that would be a poor outcome.

I would prefer a system that avoided the Y2K abbreviation even with safeguards. Either implement a rigorous XPATH simple level/nest system or fully implement the semantic equivalent name.

-----Original Message-----
From: "Fabio Vitali" <fabio@cs.unibo.it>
Sent: Tuesday, December 3, 2013 5:13am
To: "legaldocml@lists.oasis-open.org" <legaldocml@lists.oasis-open.org>
Cc: "akomantoso-xml@googlegroups.com" <akomantoso-xml@googlegroups.com>
Subject: [legaldocml] On ids and numbering

Dear all,

let me make a proposal on numbering and ids from the discussions we had on the past teleconfs. Most of the actual abbreviations I used are invented on the spot, so please don't look at them critically.

The generic syntax for an id is the following:

[prefix "__"] abbr ["_" num]

* prefix is a (possibly empty) string providing uniqueness to the remaining part of the id, and based on the context in which the element appears.
* abbr is an abbreviation describing the element, and it is drawn from the list of abbreviations that Veronique has first created and Grant improved.
* num is a (possibly empty) representation of the numbering of the element within its context. If the element is necessarily unique within its context, no numbering is used.

This is how to use this syntax:

* An explicitly numbered element is an element that is numbered by the author of the _expression_, so that it is not the task of the author of the markup to establish such number, but only to recognize it in the text. Such number is most frequently placed in a <num> element inside the element's structure. An implicitly numbered element, on the other hand, is an element that was not numbered by the author of the _expression_, and therefore must be numbered in some ways by the author of the manifestation, and in particular has no <num> element relating to itself anywhere.

* The context of the numbering of elements <X> are the containing elements <Y> that suggest, imply or force a re-start of the numbering of all internal <X>s. This can be either explicit (when the <X>s are explicitly numbered) or implicit (otherwise). Different contexts imply that elements with the same name may end up having the same abbr and the same number, and must therefore be disambiguated through the use of a prefix. The best option for such prefix is the id of the context element <Y>. For instance, in many traditions chapters restart numbering within every title, so "chp_2" for Chapter 2 could be ambiguous. Therefore, in these cases the id for Chapter 2 of Title I could be "ttl_I_chp_2" (assuming that "ttl_I" unambiguously identifies Title I).

* All document classes (act, bill, doc, etc.) are ALWAYS contexts. This means that, except particular cases, all numbers restart whenever a new document class is started (e.g., in a composite document each document component has its own local numbering). Similarly, <quotedStructure> and <extractStructure> are always contexts, EVEN IF they do not force a **restart** of the numbering, but just a different numbering context within themselves. Finally, plain inline elements are NEVER contexts. For instance, the id for article 12 in the first document of a composite document will be "doc_1__art_12", while in the second document it will be "doc_2__art_12".

* Elements that are necessarily unique within a given context will require NO numbering. For instance, there is exactly ONE <body> in acts and bills, and therefore its id can be simply "body" (or "doc_1__body" in case of a composite document, of course). Analogously, there is at most ONE <content> element inside articles or sections, and therefore the id of the <content> element of article 12 will be simply "art_12__cnt".

* What constitute a context is a tradition-dependent issue for explicitly numbered elements. That is to say, when the element is explicitly numbered we need to make sure whether the numberings starts multiple times in the same document. If this is the case, then the identification of the correct prefix requires the identification of the element causing the restart, and ITS id is used as prefix. For instance, while in many traditions articles are always globally numbered, in Latin America a special structure of transitional articles is added at the end of the document, whose numbering restarts. In this case, article 12 of the main part of the document will have id "main__art_12" while article 12 of the transitional part will be "transitional__art_12".

* For non-eplicilty numbered elements, on the other hand, it is a manifestation-level decision to determine an element smaller than the document class (if any) that would constitute the context for the element. Obvious choices would be:
- none (i.e., all non-numbered elements are numbered from the beginning of the relevant document class).
- the closest hcontainer (i.e., all elements within a hierarchy would be numbered after the id of the containing hierarchical elements).
- the closest containing element (ignoring containing inlines): block, container, container.

Suppose for instance that we are considering the numbering of the third <ref> element within the second <p> element of article 12. In turn, by counting from the smallest enclosing document class, article 12 is explicitly numbered, and the local tradition has no context except for the document, thus its id is always "art_12". The second <p> of the article is actually the 14th <p> of the whole document, and its third ref is actually the 9th of the whole document and the fifth of the article (because the first <p> has two more refs). Thus we have the following options:

- Case "none": 'art_12' for the <article>, 'p_14' for the <p> and 'ref_9' for the <ref>.
- Case "hcontainer": 'art_12' for the <article>, 'art_12__p_2' for the <p> and 'art_12__ref_5' for the <ref>.
- Case "closest containing element": 'art_12' for the <article>, 'art_12__p_2' for the <p> and 'art_12__p_2__ref_3' for the <ref>.

Of course additional complexities may arise from mixing up policies depending on the elements (e.g. making akoma-specific elements become hcontainer-driven and html-specific elements become container-driven), but I would argue against this policy. My vote is cast for the case "hcontainer" but I will not cut my wrists in case another option is voted. I would strongly defend all the other proposals made in this message.

* One last discussion item regards abundant or incomplete references. An abundant reference is a reference, in particular the fragment part of an IRI, that contains more information than needed to match it to the id of an element. An incomplete reference, on the other hand, contains less information than necessary and therefore may point to more than one possible destinations. BTW, we must never deal with abundant or incomplete ***id*** in the id attributes of elements, since ids are created by the author of a manifestation, and therefore we should expect him/her to know what is needed to establish the minimum complete set of information to create an unambiguous id. We should only deal with abundant or incomplete references, since the author of a reference could not know everything about the document being mentioned in the text of the reference., and therefore he/she might create an incorrect reference that has too much or too little information.

In case of abundant reference, the resolver should identify the relevant minimal id (if it exists) by removing prefixes until a perfect match is found; in case of missing reference, on the other hand, the resolver must establish an interactive session with the user similar to the process of resolving work-level IRIs, and determine the missing information necessary to identify the id of an unique element.

let me know what you think.

Ciao

Fabio

--

Fabio Vitali Tiger got to hunt, bird got to fly,
Dept. of Computer Science Man got to sit and wonder "Why, why, why?'
Univ. of Bologna ITALY Tiger got to sleep, bird got to land,
phone: +39 051 2094872 Man got to tell himself he understand.
e-mail: fabio@cs.unibo.it Kurt Vonnegut (1922-2007), "Cat's cradle"
http://vitali.web.cs.unibo.it/

---------------------------------------------------------------------
To unsubscribe from this mail list, you must leave the OASIS TC that
generates this mail. Follow this link to all your TCs in OASIS at:
https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php

legaldocml message