legaldocml message

Subject: Re: [legaldocml] Re: [akomantoso-xml] Again on ids and numbering

From: Daniel Bennett <daniel@citizencontact.com>
To: Fabio Vitali <fabio@cs.unibo.it>, akomantoso-xml@googlegroups.com
Date: Mon, 10 Feb 2014 12:43:42 -0500

Dear Greg,

I agree with what Fabio has said. I would say that your question aboutHTML is telling. The <a name="x"> has been deprecated which is a greatthing and the use of the id attribute any HTML tags as the replacementis a huge improvement. First, <a name> pointed to a section of adocument generally, but did not truly indicate what section. Plus it hadissues with the <a href> which were problematic.

Going to the id="sectionx" has the commonly understood feature ofgetting the browser to scroll to the section when the URL includes the#sectionx fragment reference. As you know, the id tag also allowsjavascript and CSS references.

Where LegalDocML is an XML language where the users constantly refer toa particular section, having an id attribute becomes a requirement.Unfortunately, many HTML content management systems do not have anautomatic id system, which hampers the utility (even worse, many HTMLCMS do not even enforce well formed HTML). In addition, a W3C standardthat might help to backstop this problem, XPointer has not been adoptedby the browsers (there were some browser plugins/extensions, but eventhey are gone I think).

HTML does not have a way for the id attribute to be accessed or seen bythe browser user. Whereas some browsers sort of indicated the <a name>,none of the browsers allow the id tag to be used within the browserwindow. Using Javascript and CSS it is possible to let a web publishergive viewers utility of the id values. Some allow a section of a webdocument to be "shared" (essentially a URL with fragment that can beused for cut/paste, citing, social media references, et cetera.

Perhaps, it may be the HTML representation of legal documents that helpshighlight this capability and show its utility for citing/accessingportions of non-legal web documents.


Good question Greg.
Daniel

Daniel Bennett
daniel@citizencontact.com


On 2/10/2014 12:15 PM, Fabio Vitali wrote:

Dear Greg,

the use of ids in Akoma Ntoso is, as you hypothesize, slightly more complex in Akoma Ntoso than in HTML.

Let me bring a few aspects that affect ids.

First of all, in HTML the important link type is anchor-to-document, while the anchor-to-anchor link is a minor addition for peculiar cases (and highly criticized by usability experts, btw); this is the reason ids are never required, and authors are expected to provide ids only for those structures that are likely destinations of anchor-to-anchor links, e.g., basically, a few section headings. Contrast this with legislation, wherein ALL references are to a precise substructure of a highly hierarchical document flow, and any substructure may become a destination. This is the reason we require ids for most elements, so that you do not have to curse some unknown markup author because he forgot to place an id where you most needed it.

A second issue is that of the purpose of the reference. In HTML, the reference is most always meant for navigation of human users, so that it is only important to come close enough to the intended destination that a human eye can scan the surrounding and find the exact destination somewhere around there. In legislation, we have an additional type of references, that of *modifications*, that require that a specific substructure is precisely identified and modified by a modification instruction. In this case, one cannot be satisfied with the fact that the intended destination is somewhat near the reached destination: they must coincide.

A third issue is the fact that by using the FRBR layering we are strongly differentiating between the legislative context and the markup used to represent it. References are legislative concepts, and exist regardless of the markup and markup author that express is practically. The same content, for instance, could be represented in a number of different XML files created by different authors. They would be all different manifestations of the same expression, each of which may have the same body, but different markup choices, metadata, commentary, etc. References would need to work regardless of the specific manifestation chosen as the destination, and indeed it is important that all manifestations use the same ids for the same structures, because they HAVE BEEN STANDARDIZED by the TC. This is impacted by the fact that I might not even have the XML of the destination, or even that it may not even exist yet (time-based alchemies are frequent in legislation, or I might need to create links to documents that haven't been converted into Akoma Ntoso yet, etc.) Thus providing a forced and precise syntax for ids we can do our best to guarantee that all different manifestations of the same content have the same ids, and that I do not need to read into an XML file to divine the values of its ids.

A fourth and final issue is connected to that, and it is the issue of dynamic references. We all know that legal reference have peculiar traits regarding time. For instance, in case of an evolving document (e.g. a piece of legislation receiving references and being actively modified by the legislator), the actual destination of the reference is not the original version, nor the current version, but in many cases the version of the document that was valid at the moment in time when your case took place. Reference are dynamic, rather than static, because the destination moves in time and jurisdiction according to your needs, rather than being fixed to a specific sentence or fragment. This means that point-in-time consolidation is an important affair, and that determining the destination of a dynamic link requires at the very least that structures existing in multiple versions are named consistently. It must be clear that, if section 35 of the initial version of a title of a US code had some id Y, then ALL subsequent versions of that same section 35 (even after a renumbering action) have the same id Y, so that once determine the version you need, bringing you to the right structure is easy and straightforward.

To summarize, the whole point of the id discussion, and the reason I am suggesting a semantically aware syntax for ids, is to make sure that ids can be used regardless of versions of the same document, regardless of the author of individual XML markups, regardless of usage as navigational or modificative reference, and knowing full well that point-to-point references are the norm rather than the exception in our case.

I hope this is convincing and that it answers to your questions.

Ciao

Fabio

Il giorno 10/feb/2014, alle ore 07:51, Greg Kempe <gregkempe@gmail.com> ha scritto:

Hi Fabio,

I’ve watched the discussion around IDs with interest, it seems that getting them “right” is pretty challenging.

I’ve been wondering if we cannot simplify our use of IDs, but realise that I might not have the full context. So, why are IDs necessary and what is their intended purpose?

In HTML, IDs are used to reference an element inside the document (eg. to apply styling or manipulate an element) and as an anchor for moving inside a document. As such, they are useful within the internal context of the document, and externally useful only if you already have internal knowledge of the document (ie. you can’t guess an ID without reading the document). They are completely freeform and don’t necessarily describe the structural location of the element within the document. It’s entirely up to the author to decide on them and they are optional on all elements (AFAIK).

So, putting aside CSS styling, does AN need to have different semantics for its IDs than HTML or can we borrow from how HTML defines and uses them? Do they need to encode the structural location of an element and, If so, would an XPath or XQuery location be better suited for that?

Having a formal format for IDs seems to imply that it will allow us to calculate the ID of an element without ever reading the document. In other words, that IDs are useful external to the document without having any internal knowledge of it. Is that actually a use-case and would the format being discussed support that?

Apologies if all of this has been discussed before.

Thanks,
Greg

On 09 February 2014 at 4:07:13 PM, Fabio Vitali (fabio@cs.unibo.it) wrote:

Dear all,

after this week's informal discussion, I would like to make an amended proposal for the management of ids in Akoma Ntoso. I hope I got all the suggestions in the appropriate place, Let me know if I forgot anything.

The generic syntax for an id is the following:

[prefix "__"] element_ref ["_" num]

* prefix is a (possibly empty) string providing uniqueness to the remaining part of the id, and based on the context in which the element appears.

Prefix
------
The context of an element is the element that suggest, imply or force a re-start of the numbering of all internal or subsequent elements of the same name. Different contexts imply that elements with the same name may end up having the same element_ref and the same num, and must therefore be disambiguated through the use of a prefix. Such prefix is the id of the context element. For instance, in many traditions chapters' numbering restarts within every title, so "chp_2" for Chapter 2 could be ambiguous. In these cases the id for Chapter 2 of Title I will be "title_I__chp_2" (assuming that "title_I" is the whole id for Title I. Elements that are globally unique or globally numbered within a document require no prefix (in the hypothesis of a single document XML file).

* All document classes (act, bill, doc, etc.) are ALWAYS contexts. This means that, except particular cases, all numbers restart whenever a new document class is started (e.g., in a composite document each document component has its own local numbering).
* Elements <quotedStructure> and <embeddedStructure> are always contexts, EVEN IF they do not force a **restart** of the numbering, but just a different numbering context within themselves.
* Plain inline elements are NEVER contexts. Exception: element <mod> is ALWAYS a context.

Element_ref
-----------
element_ref is a reference to the identified element; this is always the name of the element, except for a brief list of well-known abbreviations as in the following table:

FOR ELEMENT x USE ABBREVIATION y
*** TBD *** *** TBD ***

num
---

num is a (possibly empty) representation of the numbering of the element within its context.

Globally and locally unique elements: if the element is necessarily unique within its context, no numbering is used. This means that the id of elements that are necessarily unique within a given context will have no num part. For instance, since there is exactly ONE <body> in acts and bills, its id can be simply "body" (or "doc_1__body" in case of a composite document, of course). Analogously, since there is at most ONE <content> element inside articles or sections, the id of the <content> element of article 12 will be simply "art_12__content".

Explicitly numbered elements: an explicitly numbered element has its number determined in the expression itself in the form of a <num> subelement. The num part of the ids of such elements corresponds to the stripping of all punctuation, separating as well as redundant characters in the content of the <num> element. The representation is case-sensitive. For instance, if article 12 contains <num>Art. 12 bis</num> then the num part of the id will be "12bis";

It is the job of the author of the manifestation to determine whether the numbering expressed in the <num> element is global (i.e., it starts at 1 at the beginning of the document class) or local (i.e., it restarts at 1 inside or after every instance of an intermediate element). This is usually made clear within every legal tradition and usually can be established by briefly examining a few or even just one document in its original form.

Implicitly numbered elements: an implicitly numbered element has no <num> sub-element, and its numbering is established by counting the occurrences of similar elements within the same context, necessarily using arabic numbers.

It is the job of the author of the manifestation to determine whether the best way to count these elements is globally (i.e., starting at 1 at the beginning of the document class) or locally (i.e., restarting at 1 inside or after every instance of an intermediate element). This naming convention provides no rules on this choice, but there are a few common sense approaches. For instance, it is very natural that <eop> elements are globally counted, and <eol> are locally counted by their preceding <eop> element, and as such, the third <eop> element (separating the third page from the fourth) has id "eop_3" (note no prefix), while the fifteenth end of line after this <eop> will have as id "eop_3__eol_15". On the other hand, <p> elements within a given structure are reasonably counted locally (as in "third p of section 12"). This is not necessarily the immediately containing element (which in this case would be the <content> element), but any containing or preceding element that in the opinion of the author of the manifestation provides context for the counting. Thus the third p of section 12 would have "sect_12__p_3" as its id.

Abundant or incomplete references
---------------------------------
An abundant reference is a reference, in particular the fragment part of an IRI, that contains more information than needed to match it to the id of an element. An incomplete reference, on the other hand, contains less information than necessary and therefore may point to more than one possible destinations. BTW, we must never deal with abundant or incomplete ***id*** in the id attributes of elements, since ids are created by the author of a manifestation, and therefore we should expect him/her to know what is needed to establish the minimum complete set of information to create an unambiguous id. We should only deal with abundant or incomplete references, since the author of a reference could not know everything about the document being mentioned in the text of the reference., and therefore he/she might create an incorrect reference that has too much or too little information.

In case of abundant reference, the resolver should identify the relevant minimal id (if it exists) by removing prefixes until a perfect match is found; in case of missing reference, on the other hand, the resolver must establish an interactive session with the user similar to the process of resolving work-level IRIs, and determine the missing information necessary to identify the id of an unique element.

let me know what you think.

Ciao

Fabio

Fabio Vitali Tiger got to hunt, bird got to fly,
Dept. of Computer Science Man got to sit and wonder "Why, why, why?'
Univ. of Bologna ITALY Tiger got to sleep, bird got to land,
phone: +39 051 2094872 Man got to tell himself he understand.
e-mail: fabio@cs.unibo.it Kurt Vonnegut (1922-2007), "Cat's cradle"
http://vitali.web.cs.unibo.it/

--
You received this message because you are subscribed to the Google Groups "akomantoso-xml" group.
To unsubscribe from this group and stop receiving emails from it, send an email to akomantoso-xml+unsubscribe@googlegroups.com.
To post to this group, send email to akomantoso-xml@googlegroups.com.
Visit this group at http://groups.google.com/group/akomantoso-xml.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "akomantoso-xml" group.
To unsubscribe from this group and stop receiving emails from it, send an email to akomantoso-xml+unsubscribe@googlegroups.com.
To post to this group, send email to akomantoso-xml@googlegroups.com.
Visit this group at http://groups.google.com/group/akomantoso-xml.
For more options, visit https://groups.google.com/groups/opt_out.



--

Fabio Vitali                            Tiger got to hunt, bird got to fly,
Dept. of Computer Science        Man got to sit and wonder "Why, why, why?'
Univ. of Bologna  ITALY               Tiger got to sleep, bird got to land,
phone:  +39 051 2094872              Man got to tell himself he understand.
e-mail: fabio@cs.unibo.it         Kurt Vonnegut (1922-2007), "Cat's cradle"
http://vitali.web.cs.unibo.it/





---------------------------------------------------------------------
To unsubscribe from this mail list, you must leave the OASIS TC that
generates this mail.  Follow this link to all your TCs in OASIS at:
https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php

Follow-Ups:
- Re: [legaldocml] [akomantoso-xml] Again on ids and numbering
  - From: Fabio Vitali <fabio@cs.unibo.it>

References:
- Again on ids and numbering
  - From: Fabio Vitali <fabio@cs.unibo.it>
- Re: [akomantoso-xml] Again on ids and numbering
  - From: Fabio Vitali <fabio@cs.unibo.it>