Re: [akomantoso-xml] On ids and numbering

My comments are below at GCV:

On Tue, Dec 3, 2013 at 2:13 AM, Fabio Vitali <fabio@cs.unibo.it> wrote:

Dear all,

let me make a proposal on numbering and ids from the discussions we had on the past teleconfs. Most of the actual abbreviations I used are invented on the spot, so please don't look at them critically.

The generic syntax for an id is the following:

[prefix "__"] abbr ["_" num]

* prefix is a (possibly empty) string providing uniqueness to the remaining part of the id, and based on the context in which the element appears.
* abbr is an abbreviation describing the element, and it is drawn from the list of abbreviations that Veronique has first created and Grant improved.
* num is a (possibly empty) representation of the numbering of the element within its context. If the element is necessarily unique within its context, no numbering is used.

GCV: I agree. I would add though the the "num" part must be case-sensitive. I have cases, here in California, where sec_1__p_iv and sec_1__p_IV will both exist and will be identifiying different paragraphs. It is the practice here to use numbering a...z, A..Z, lower roman numerals, and upper lower roman numberals in the same sequence. (And yes, there is an unfortunate overlapping problem - but that can't be changed).

This is how to use this syntax:

* An explicitly numbered element is an element that is numbered by the author of the _expression_, so that it is not the task of the author of the markup to establish such number, but only to recognize it in the text. Such number is most frequently placed in a <num> element inside the element's structure. An implicitly numbered element, on the other hand, is an element that was not numbered by the author of the _expression_, and therefore must be numbered in some ways by the author of the manifestation, and in particular has no <num> element relating to itself anywhere.

GCV: I agree

* The context of the numbering of elements <X> are the containing elements <Y> that suggest, imply or force a re-start of the numbering of all internal <X>s. This can be either explicit (when the <X>s are explicitly numbered) or implicit (otherwise). Different contexts imply that elements with the same name may end up having the same abbr and the same number, and must therefore be disambiguated through the use of a prefix. The best option for such prefix is the id of the context element <Y>. For instance, in many traditions chapters restart numbering within every title, so "chp_2" for Chapter 2 could be ambiguous. Therefore, in these cases the id for Chapter 2 of Title I could be "ttl_I_chp_2" (assuming that "ttl_I" unambiguously identifies Title I).

GCV: (assuming you mean "ttl_1__chp_2") I agree

* All document classes (act, bill, doc, etc.) are ALWAYS contexts. This means that, except particular cases, all numbers restart whenever a new document class is started (e.g., in a composite document each document component has its own local numbering). Similarly, <quotedStructure> and <extractStructure> are always contexts, EVEN IF they do not force a **restart** of the numbering, but just a different numbering context within themselves. Finally, plain inline elements are NEVER contexts. For instance, the id for article 12 in the first document of a composite document will be "doc_1__art_12", while in the second document it will be "doc_2__art_12".

GCV: I agree

* Elements that are necessarily unique within a given context will require NO numbering. For instance, there is exactly ONE <body> in acts and bills, and therefore its id can be simply "body" (or "doc_1__body" in case of a composite document, of course). Analogously, there is at most ONE <content> element inside articles or sections, and therefore the id of the <content> element of article 12 will be simply "art_12__cnt".

GCV: I agree

* What constitute a context is a tradition-dependent issue for explicitly numbered elements. That is to say, when the element is explicitly numbered we need to make sure whether the numberings starts multiple times in the same document. If this is the case, then the identification of the correct prefix requires the identification of the element causing the restart, and ITS id is used as prefix. For instance, while in many traditions articles are always globally numbered, in Latin America a special structure of transitional articles is added at the end of the document, whose numbering restarts. In this case, article 12 of the main part of the document will have id "main__art_12" while article 12 of the transitional part will be "transitional__art_12".

GCV: I think that some elements (notes, refs, and figures, in particular), may use a numbering context that is different from the surrounding context. Also, English tradition generally causes sections to be globally numbered rather than articles.

* For non-eplicilty numbered elements, on the other hand, it is a manifestation-level decision to determine an element smaller than the document class (if any) that would constitute the context for the element. Obvious choices would be:
- none (i.e., all non-numbered elements are numbered from the beginning of the relevant document class).
- the closest hcontainer (i.e., all elements within a hierarchy would be numbered after the id of the containing hierarchical elements).
- the closest containing element (ignoring containing inlines): block, container, container.

Suppose for instance that we are considering the numbering of the third <ref> element within the second <p> element of article 12. In turn, by counting from the smallest enclosing document class, article 12 is explicitly numbered, and the local tradition has no context except for the document, thus its id is always "art_12". The second <p> of the article is actually the 14th <p> of the whole document, and its third ref is actually the 9th of the whole document and the fifth of the article (because the first <p> has two more refs). Thus we have the following options:

- Case "none": 'art_12' for the <article>, 'p_14' for the <p> and 'ref_9' for the <ref>.
- Case "hcontainer": 'art_12' for the <article>, 'art_12__p_2' for the <p> and 'art_12__ref_5' for the <ref>.
- Case "closest containing element": 'art_12' for the <article>, 'art_12__p_2' for the <p> and 'art_12__p_2__ref_3' for the <ref>.

Of course additional complexities may arise from mixing up policies depending on the elements (e.g. making akoma-specific elements become hcontainer-driven and html-specific elements become container-driven), but I would argue against this policy. My vote is cast for the case "hcontainer" but I will not cut my wrists in case another option is voted. I would strongly defend all the other proposals made in this message.

GCV: I think that how unnumbered elements are assigned an id will be a jurisdiction specific decision - and it will vary based on the case. I would prefer that unnumbered paragraphs (non-designated to use LRC jargon) be numbered within their immediate surrounding context, but the refs and notes be numbered based on a higher context.

* One last discussion item regards abundant or incomplete references. An abundant reference is a reference, in particular the fragment part of an IRI, that contains more information than needed to match it to the id of an element. An incomplete reference, on the other hand, contains less information than necessary and therefore may point to more than one possible destinations. BTW, we must never deal with abundant or incomplete ***id*** in the id attributes of elements, since ids are created by the author of a manifestation, and therefore we should expect him/her to know what is needed to establish the minimum complete set of information to create an unambiguous id. We should only deal with abundant or incomplete references, since the author of a reference could not know everything about the document being mentioned in the text of the reference., and therefore he/she might create an incorrect reference that has too much or too little information.

In case of abundant reference, the resolver should identify the relevant minimal id (if it exists) by removing prefixes until a perfect match is found; in case of missing reference, on the other hand, the resolver must establish an interactive session with the user similar to the process of resolving work-level IRIs, and determine the missing information necessary to identify the id of an unique element.

GCV: I feel quite strongly that ids should contain a minimal (rather than maximal) set of information required to identify an ite. That way, if the reference is abundant, you can always resolve it - by throwing away information that you don't need. If the id was maximal, and the refernece was incomplete, it would not be possible to resolve it.

let me know what you think.

Ciao

Fabio

--

Fabio Vitali Tiger got to hunt, bird got to fly,
Dept. of Computer Science Man got to sit and wonder "Why, why, why?'
Univ. of Bologna ITALY Tiger got to sleep, bird got to land,
phone: +39 051 2094872 Man got to tell himself he understand.
e-mail: fabio@cs.unibo.it Kurt Vonnegut (1922-2007), "Cat's cradle"
http://vitali.web.cs.unibo.it/

--
You received this message because you are subscribed to the Google Groups "akomantoso-xml" group.
To unsubscribe from this group and stop receiving emails from it, send an email to akomantoso-xml+unsubscribe@googlegroups.com.
To post to this group, send email to akomantoso-xml@googlegroups.com.
Visit this group at http://groups.google.com/group/akomantoso-xml.
For more options, visit https://groups.google.com/groups/opt_out.

--
____________________________________________________________________
Grant Vergottini
Xcential Group, LLC.
email: grant.vergottini@xcential.com
phone: 858.361.6738

legaldocml message