legaldocml message

Subject: Re: [akomantoso-xml] Again on ids and numbering
From: Fabio Vitali <fabio@cs.unibo.it>
To: akomantoso-xml@googlegroups.com
Date: Fri, 14 Feb 2014 11:55:26 +0100
Dear Greg, 

the Akoma Ntoso naming convention does NOT assume that the document is stored in Akoma Ntoso XML, but only that there is a mapping between the FRBR URI and the URL of a file stored somewhere on the net, and to which our URI can be resolved into. 

In case of identifiers, the Akoma Ntoso naming convention does not assume, yet again, that the document is stored in Akoma Ntoso XML, but only that identifiers work in whatever format has been used. This means that any XML-based language, including Akoma Ntoso, HTML, TEI, DocBook, ePub, kf8 or Mobi are ok, PDF or MS Word not so much. 

The AN Naming convention also assumes that it is the job of the author of the linked-to document to use ids that are consistent with the naming convention. This is necessary because in HTTP the fragment identifier is never sent with the request, and is only known and handled by the user agent, so we must assume that identifiers are present in the response and have the correct form. There is very little we can do otherwise. In particular, it would make no sense to convert all fragment identifiers in references using the syntax of the destination documents, as these syntaxes can very well be innumerable. 

In your example, we MUST assume that, regardless of the format the Constitution has been written in, it was used a format that allows fragment identifiers, AND these identifiers are consistent with the Akoma Ntoso Naming convention. 

In particular, for your example, "/us/act/constitution#sect_3__clause_4" should do the trick. 

Ciao

Fabio

--

Il giorno 14/feb/2014, alle ore 10:17, Greg Kempe <gregkempe@gmail.com> ha scritto:

> Hi Fabio,
> 
> Thanks for your detailed reply, it makes the challenges and requirements much clearer. It’s a tricky problem!
> 
> To paraphrase your summary: AN requires a mechanism for unambiguously identifying document elements across revisions, authors etc., for navigational and other purposes. It must be possible to construct a reference without ever seeing the document in which the referenced element occurs. Once we have such a mechanism presumably we can combine the canonical name of an AN document (an FRBR work URI) with this mechanism to unambiguously identify the element within the document.
> 
> Your proposal is to use an ID attribute to associate a standardised canonical name for an element with the element itself inside an AN document. If we have a formalised way of constructing such an ID, then everyone can name it consistently within all AN documents.
> 
> However, does this solve the problem of referencing an element from outside of the AN document?
> 
> An ID attribute is a way of interpreting a reference within the context of an AN document. But how do we encode that initial reference? What is the equivalent of an FRBR work URI for an element of a piece of legislation? My concern is that if we focus on how to structure an ID attribute based on the local context of the document, we’ll have a solution that works when you know you’re working with an AN document and you know what it is likely to look like, but will not work for other scenarios.
> 
> For example: I have an AN document that is the Animal By-law for my city. It references Section 4(3) of our Constitution that is not in AN form. I need to be able to do the following:
> 
> 1. unambiguously encode the reference to Section 4(3) of the Constitution in the Animal by-law XML.
> 2. when displaying the Animal by-law as a webpage, interpret that reference and transform it into a link on the Government's Constitution website which has its own (hopefully consistent) rules for specifying the page for Section 4(3).
> 
> Once we know how to do these steps, then we can determine a way of unambiguously interpreting the reference encoded in step (1) within the context of an AN document that represents the Constitution (eg. perhaps by calculating an ID string based on a standard format and then finding an element with that ID).
> 
> Perhaps at the end of the day we do encode an element’s canonical name as an ID attribute. That might be one of many ways of interpreting a reference in the context of an AN document. But do we not first need to formalise what we’re trying to encode?
> 
> The AN FRBR URI format is great for identifying a piece of legislation no matter what its expression. What is the equivalent for referring to "Section 4(3) of our Constitution"?
> 
> I realise I’m jumping in here without much background, please take this all with a pinch of salt.
> 
> Thanks,
> Greg
> On 10 February 2014 at 7:15:08 PM, Fabio Vitali (fabio@cs.unibo.it) wrote:
> 
>> Dear Greg, 
>> 
>> the use of ids in Akoma Ntoso is, as you hypothesize, slightly more complex in Akoma Ntoso than in HTML. 
>> 
>> Let me bring a few aspects that affect ids. 
>> 
>> First of all, in HTML the important link type is anchor-to-document, while the anchor-to-anchor link is a minor addition for peculiar cases (and highly criticized by usability experts, btw); this is the reason ids are never required, and authors are expected to provide ids only for those structures that are likely destinations of anchor-to-anchor links, e.g., basically, a few section headings. Contrast this with legislation, wherein ALL references are to a precise substructure of a highly hierarchical document flow, and any substructure may become a destination. This is the reason we require ids for most elements, so that you do not have to curse some unknown markup author because he forgot to place an id where you most needed it. 
>> 
>> A second issue is that of the purpose of the reference. In HTML, the reference is most always meant for navigation of human users, so that it is only important to come close enough to the intended destination that a human eye can scan the surrounding and find the exact destination somewhere around there. In legislation, we have an additional type of references, that of *modifications*, that require that a specific substructure is precisely identified and modified by a modification instruction. In this case, one cannot be satisfied with the fact that the intended destination is somewhat near the reached destination: they must coincide. 
>> 
>> A third issue is the fact that by using the FRBR layering we are strongly differentiating between the legislative context and the markup used to represent it. References are legislative concepts, and exist regardless of the markup and markup author that express is practically. The same content, for instance, could be represented in a number of different XML files created by different authors. They would be all different manifestations of the same expression, each of which may have the same body, but different markup choices, metadata, commentary, etc. References would need to work regardless of the specific manifestation chosen as the destination, and indeed it is important that all manifestations use the same ids for the same structures, because they HAVE BEEN STANDARDIZED by the TC. This is impacted by the fact that I might not even have the XML of the destination, or even that it may not even exist yet (time-based alchemies are frequent in legislation, or I might need to create links to documents that haven't been converted into Akoma Ntoso yet, etc.) Thus providing a forced and precise syntax for ids we can do our best to guarantee that all different manifestations of the same content have the same ids, and that I do not need to read into an XML file to divine the values of its ids. 
>> 
>> A fourth and final issue is connected to that, and it is the issue of dynamic references. We all know that legal reference have peculiar traits regarding time. For instance, in case of an evolving document (e.g. a piece of legislation receiving references and being actively modified by the legislator), the actual destination of the reference is not the original version, nor the current version, but in many cases the version of the document that was valid at the moment in time when your case took place. Reference are dynamic, rather than static, because the destination moves in time and jurisdiction according to your needs, rather than being fixed to a specific sentence or fragment. This means that point-in-time consolidation is an important affair, and that determining the destination of a dynamic link requires at the very least that structures existing in multiple versions are named consistently. It must be clear that, if section 35 of the initial version of a title of a US code had some id Y, then ALL subsequent versions of that same section 35 (even after a renumbering action) have the same id Y, so that once determine the version you need, bringing you to the right structure is easy and straightforward. 
>> 
>> To summarize, the whole point of the id discussion, and the reason I am suggesting a semantically aware syntax for ids, is to make sure that ids can be used regardless of versions of the same document, regardless of the author of individual XML markups, regardless of usage as navigational or modificative reference, and knowing full well that point-to-point references are the norm rather than the exception in our case. 
>> 
>> I hope this is convincing and that it answers to your questions. 
>> 
>> Ciao 
>> 
>> Fabio 
>> 
>> -- 
>> 
>> 
>> 
>> Il giorno 10/feb/2014, alle ore 07:51, Greg Kempe <gregkempe@gmail.com> ha scritto: 
>> 
>> > Hi Fabio, 
>> > 
>> > I've watched the discussion around IDs with interest, it seems that getting them "right" is pretty challenging. 
>> > 
>> > I've been wondering if we cannot simplify our use of IDs, but realise that I might not have the full context. So, why are IDs necessary and what is their intended purpose? 
>> > 
>> > In HTML, IDs are used to reference an element inside the document (eg. to apply styling or manipulate an element) and as an anchor for moving inside a document. As such, they are useful within the internal context of the document, and externally useful only if you already have internal knowledge of the document (ie. you can't guess an ID without reading the document). They are completely freeform and don't necessarily describe the structural location of the element within the document. It's entirely up to the author to decide on them and they are optional on all elements (AFAIK). 
>> > 
>> > So, putting aside CSS styling, does AN need to have different semantics for its IDs than HTML or can we borrow from how HTML defines and uses them? Do they need to encode the structural location of an element and, If so, would an XPath or XQuery location be better suited for that? 
>> > 
>> > Having a formal format for IDs seems to imply that it will allow us to calculate the ID of an element without ever reading the document. In other words, that IDs are useful external to the document without having any internal knowledge of it. Is that actually a use-case and would the format being discussed support that? 
>> > 
>> > Apologies if all of this has been discussed before. 
>> > 
>> > Thanks, 
>> > Greg 
>> > 
>> > 
>> > On 09 February 2014 at 4:07:13 PM, Fabio Vitali (fabio@cs.unibo.it) wrote: 
>> > 
>> >> Dear all, 
>> >> 
>> >> after this week's informal discussion, I would like to make an amended proposal for the management of ids in Akoma Ntoso. I hope I got all the suggestions in the appropriate place, Let me know if I forgot anything. 
>> >> 
>> >> The generic syntax for an id is the following: 
>> >> 
>> >> [prefix "__"] element_ref ["_" num] 
>> >> 
>> >> * prefix is a (possibly empty) string providing uniqueness to the remaining part of the id, and based on the context in which the element appears. 
>> >> 
>> >> Prefix 
>> >> ------ 
>> >> The context of an element is the element that suggest, imply or force a re-start of the numbering of all internal or subsequent elements of the same name. Different contexts imply that elements with the same name may end up having the same element_ref and the same num, and must therefore be disambiguated through the use of a prefix. Such prefix is the id of the context element. For instance, in many traditions chapters' numbering restarts within every title, so "chp_2" for Chapter 2 could be ambiguous. In these cases the id for Chapter 2 of Title I will be "title_I__chp_2" (assuming that "title_I" is the whole id for Title I. Elements that are globally unique or globally numbered within a document require no prefix (in the hypothesis of a single document XML file). 
>> >> 
>> >> * All document classes (act, bill, doc, etc.) are ALWAYS contexts. This means that, except particular cases, all numbers restart whenever a new document class is started (e.g., in a composite document each document component has its own local numbering). 
>> >> * Elements <quotedStructure> and <embeddedStructure> are always contexts, EVEN IF they do not force a **restart** of the numbering, but just a different numbering context within themselves. 
>> >> * Plain inline elements are NEVER contexts. Exception: element <mod> is ALWAYS a context. 
>> >> 
>> >> Element_ref 
>> >> ----------- 
>> >> element_ref is a reference to the identified element; this is always the name of the element, except for a brief list of well-known abbreviations as in the following table: 
>> >> 
>> >> FOR ELEMENT x	 USE ABBREVIATION y 
>> >> *** TBD ***	 *** TBD *** 
>> >> 
>> >> num 
>> >> --- 
>> >> 
>> >> num is a (possibly empty) representation of the numbering of the element within its context. 
>> >> 
>> >> Globally and locally unique elements: if the element is necessarily unique within its context, no numbering is used. This means that the id of elements that are necessarily unique within a given context will have no num part. For instance, since there is exactly ONE <body> in acts and bills, its id can be simply "body" (or "doc_1__body" in case of a composite document, of course). Analogously, since there is at most ONE <content> element inside articles or sections, the id of the <content> element of article 12 will be simply "art_12__content". 
>> >> 
>> >> Explicitly numbered elements: an explicitly numbered element has its number determined in the expression itself in the form of a <num> subelement. The num part of the ids of such elements corresponds to the stripping of all punctuation, separating as well as redundant characters in the content of the <num> element. The representation is case-sensitive. For instance, if article 12 contains <num>Art. 12 bis</num> then the num part of the id will be "12bis"; 
>> >> 
>> >> It is the job of the author of the manifestation to determine whether the numbering expressed in the <num> element is global (i.e., it starts at 1 at the beginning of the document class) or local (i.e., it restarts at 1 inside or after every instance of an intermediate element). This is usually made clear within every legal tradition and usually can be established by briefly examining a few or even just one document in its original form. 
>> >> 
>> >> Implicitly numbered elements: an implicitly numbered element has no <num> sub-element, and its numbering is established by counting the occurrences of similar elements within the same context, necessarily using arabic numbers. 
>> >> 
>> >> It is the job of the author of the manifestation to determine whether the best way to count these elements is globally (i.e., starting at 1 at the beginning of the document class) or locally (i.e., restarting at 1 inside or after every instance of an intermediate element). This naming convention provides no rules on this choice, but there are a few common sense approaches. For instance, it is very natural that <eop> elements are globally counted, and <eol> are locally counted by their preceding <eop> element, and as such, the third <eop> element (separating the third page from the fourth) has id "eop_3" (note no prefix), while the fifteenth end of line after this <eop> will have as id "eop_3__eol_15". On the other hand, <p> elements within a given structure are reasonably counted locally (as in "third p of section 12"). This is not necessarily the immediately containing element (which in this case would be the <content> element), but any containing or preceding element that in the opinion of the author of the manifestation provides context for the counting. Thus the third p of section 12 would have "sect_12__p_3" as its id. 
>> >> 
>> >> Abundant or incomplete references 
>> >> --------------------------------- 
>> >> An abundant reference is a reference, in particular the fragment part of an IRI, that contains more information than needed to match it to the id of an element. An incomplete reference, on the other hand, contains less information than necessary and therefore may point to more than one possible destinations. BTW, we must never deal with abundant or incomplete ***id*** in the id attributes of elements, since ids are created by the author of a manifestation, and therefore we should expect him/her to know what is needed to establish the minimum complete set of information to create an unambiguous id. We should only deal with abundant or incomplete references, since the author of a reference could not know everything about the document being mentioned in the text of the reference., and therefore he/she might create an incorrect reference that has too much or too little information. 
>> >> 
>> >> In case of abundant reference, the resolver should identify the relevant minimal id (if it exists) by removing prefixes until a perfect match is found; in case of missing reference, on the other hand, the resolver must establish an interactive session with the user similar to the process of resolving work-level IRIs, and determine the missing information necessary to identify the id of an unique element. 
>> >> 
>> >> let me know what you think. 
>> >> 
>> >> Ciao 
>> >> 
>> >> Fabio 
>> >> 
>> >> 
>> >> 
>> >> -- 
>> >> 
>> >> Fabio Vitali Tiger got to hunt, bird got to fly, 
>> >> Dept. of Computer Science Man got to sit and wonder "Why, why, why?' 
>> >> Univ. of Bologna ITALY Tiger got to sleep, bird got to land, 
>> >> phone: +39 051 2094872 Man got to tell himself he understand. 
>> >> e-mail: fabio@cs.unibo.it Kurt Vonnegut (1922-2007), "Cat's cradle" 
>> >> http://vitali.web.cs.unibo.it/ 
>> >> 
>> >> 
>> >> 
>> >> 
>> >> -- 
>> >> You received this message because you are subscribed to the Google Groups "akomantoso-xml" group. 
>> >> To unsubscribe from this group and stop receiving emails from it, send an email to akomantoso-xml+unsubscribe@googlegroups.com. 
>> >> To post to this group, send email to akomantoso-xml@googlegroups.com. 
>> >> Visit this group at http://groups.google.com/group/akomantoso-xml. 
>> >> For more options, visit https://groups.google.com/groups/opt_out. 
>> > 
>> > -- 
>> > You received this message because you are subscribed to the Google Groups "akomantoso-xml" group. 
>> > To unsubscribe from this group and stop receiving emails from it, send an email to akomantoso-xml+unsubscribe@googlegroups.com. 
>> > To post to this group, send email to akomantoso-xml@googlegroups.com. 
>> > Visit this group at http://groups.google.com/group/akomantoso-xml. 
>> > For more options, visit https://groups.google.com/groups/opt_out. 
>> 
>> 
>> 
>> -- 
>> 
>> Fabio Vitali Tiger got to hunt, bird got to fly, 
>> Dept. of Computer Science Man got to sit and wonder "Why, why, why?' 
>> Univ. of Bologna ITALY Tiger got to sleep, bird got to land, 
>> phone: +39 051 2094872 Man got to tell himself he understand. 
>> e-mail: fabio@cs.unibo.it Kurt Vonnegut (1922-2007), "Cat's cradle" 
>> http://vitali.web.cs.unibo.it/ 
>> 
>> 
>> 
>> 
>> -- 
>> You received this message because you are subscribed to the Google Groups "akomantoso-xml" group. 
>> To unsubscribe from this group and stop receiving emails from it, send an email to akomantoso-xml+unsubscribe@googlegroups.com. 
>> To post to this group, send email to akomantoso-xml@googlegroups.com. 
>> Visit this group at http://groups.google.com/group/akomantoso-xml. 
>> For more options, visit https://groups.google.com/groups/opt_out. 
> 
> -- 
> You received this message because you are subscribed to the Google Groups "akomantoso-xml" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to akomantoso-xml+unsubscribe@googlegroups.com.
> To post to this group, send email to akomantoso-xml@googlegroups.com.
> Visit this group at http://groups.google.com/group/akomantoso-xml.
> For more options, visit https://groups.google.com/groups/opt_out.



--

Fabio Vitali                            Tiger got to hunt, bird got to fly,
Dept. of Computer Science        Man got to sit and wonder "Why, why, why?'
Univ. of Bologna  ITALY               Tiger got to sleep, bird got to land,
phone:  +39 051 2094872              Man got to tell himself he understand.
e-mail: fabio@cs.unibo.it         Kurt Vonnegut (1922-2007), "Cat's cradle"
http://vitali.web.cs.unibo.it/
References:
- Again on ids and numbering
  - From: Fabio Vitali <fabio@cs.unibo.it>
- Re: [akomantoso-xml] Again on ids and numbering
  - From: Fabio Vitali <fabio@cs.unibo.it>