legaldocml message

Subject: Re: [legaldocml] On the replacement of '~' for '#' in Akoma Ntoso URIs.
From: Grant Vergottini <grant.vergottini@xcential.com>
To: Fabio Vitali <fvitali@gmail.com>, "legaldocml@lists.oasis-open.org" <legaldocml@lists.oasis-open.org>
Date: Thu, 13 Nov 2014 01:20:04 -0800
 Dear Fabio,

If I understand your proposal correctly, you're asking to stick to the
notation we currently have.

With this notation, which is based on a URL, the basic URL identifies
the work, and optionally an expression of a work an the manifestation
or format of the returned object.

Beyond this basic URL, the # is used to specify within the work, the
provision of interest, with the identifier (presumably the @eId) being
used following the #.

As this notation, in normal HTTP land, causes the URL to be split,
requesting the entire WORK from the server and then having the client
(browser) either scroll to the provision or otherwise handle it, it
becomes necessary for a client-side JavaScript library to handle the
actual resolution. If I understand this correctly, it means that the
resolver is implemented as a client side JavaScript library. This
client side resolver converts the reference URL into a URL to be sent
to the server and can take care of any performance considerations such
as a paging mechanism or some other form of paring down the request to
ensure you don't return a huge document when all you're really
interested in is a small region of the document.

This model differs tremendously from my world view. In my world view,
the URL is specified as a URL precisely so that it can travel along
the established HTTP mechanisms without any preprocessing. That's the
primary motivation for using a URL rather than a URN. With URLs, it is
entirely possible to refer to a provision using the same URL in a web
page, an email document, a Word document, a PowerPoint presentation,
or in any XML editor. By having no client requirement other than that
it understand the long established HTTP linking mechanisms, we're able
to create a simple referencing model that can be universal.

To this end, I implemented my resolver as a URL handler within the web
server. It's written as a DLL that is dynamically bound to the web
server and is configured to receive all traffic that directs through
URLs starting with the scheme name (in our case, that's "/akn"). So,
rather than having the web server handle the URL by returning a page
in some preconfigured hierarchy, my handler DLL is asked to retrieve
the requested provision, however the underlying data structures are
organized. An entry point into my DLL is called and I am given the URL
to resolve. The DLL reads the URL, does some work to get a response,
and then sends it back to the client. My current storage model is that
each title of the US Code is a separate entire document that is stored
in eXist. After parsing the URL, I determine the appropriate XQuery
necessary to retrieve that specific provision which I then package up
with any necessary envelope and add any necessary metadata before
finally returning the result. This approach allows me to return a
single clause, when that is what is of interest, or perhaps an entire
statutory note that contains an embedded enactment, when that is of
interest. If I want the entire document, I can get that. If I just
want a chapter, I can get that too. I can get anything, no matter the
granularity, as long as it has an identifier to grab onto.

Take a look at Title 42 of the U.S. Code. It is non-positive law - so
it isn't enacted as a title. (this is unlike positive law titles that
are single enactments) Title 42 is no more a single WORK as a Statute
book full of a year's worth of acts stored as Chapters. Actually, just
like a statute book, Title 42 is organized as 152 chapters -- each one
being a separate enactment. These enactments largely have nothing to
do with one another as Title 42 casts a broad net of a number of
mostly unrelated topics. Remember, as non-positive law, the
classification within Title 42 are not the official law -- it's the
public laws recorded in the Statutes that are. But, the chapters shown
within Title 42 will reflect executed amendments while the public laws
will not. Buried within each chapter of Title 42, you will find other
enactments that have been classified within the hierarchy of the top
level enactments. It all becomes a convoluted mess. But, with the
scheme we use, it doesn't matter. We can map any citation into an
equivalent reference and we can dereference any reference to retrieve
the provision. This is all done by the resolver, on the server. It
retrieves the incoming URL which identifies a specific provision, the
version we want, and the format we want returned, and then the
resolver does whatever is necessary to provide the expected return. In
some cases, we must look up information within some tables to handle
ambiguities and the quirks that history has given us. In other cases
we defer to a different server or we retrieve the result from another
server and return it as our own. But we can always answer the question
without ever exposing to the client any details, sometimes quite ugly,
of how we retrieved this information. It could come from flat files
stored as CLOBs in Oracle or from the XML storage in our eXist
database -- nobody has to know how. When we change how we do it,
nobody's references will ever break. The details are entirely
invisible to the client. In short, this works without any burden on
the client other than being able to pass on the necessary information
to the server using HTTP.

The problem with the # notation is that it prevents some of the
information from being passed on to the server -- requiring client
side processing to encode this information in some other unspecified
form before sending it to the server. I just don't understand the
motivation for this -- for me, the need for client-side processing is
a non-starter. It eliminates much of the value of the URL notation and
ensures that all clients be customized to work with the system. That
requirement is too closed and too limiting. Our suggestion was to use
the "~" instead to allow all the information to be passed on. My one
discomfort was that keeping the position at the end of the string was
wrong. I would rather the notation be something like
"/akn/{workIdentifier}[~{portion}][/[{lang}]@{version}][.{format}]".

On Wed, Nov 12, 2014 at 8:29 AM, Fabio Vitali <fvitali@gmail.com> wrote:
> Dear all,
>
> a few brief reflections about the idea of introducing yet another section in our URIs, introduce by the character '~' (or whatever) instead of the hash ('#') or the specification of the individual fragment of the reference. I am not commenting on the specific character, but just of the idea of replacing the hash with something else.
>
> The example provided by Monica refers to a MANIFESTATION-level references.
>
>> Example:
>> /akn/us/act/usc/title9/eng@2014-10-10/main.akn#chp_3
>>
>> to arrive to this:
>> /akn/us/act/usc/title9/eng@2014-10-10/main.akn~chp_3
>
> I have minor quibbles about the order of things, but in general the idea of replacing the hash in MANIFESTATION-LEVEL references seems to me kind of contrived, but not wrong per se.
>
> At the EXPRESSION and WORK level, on the other hand, the situation is completely different. I am strongly convinced that it is wrong to replace hashes, and that it provides no advantages neither in the short nor in the long term. Consider a WORK-LEVEL identifier such as /akn/us/act/usc/title9~chp_3
>
> The problem lies around the fact that /akn/us/act/usc/title9~chp_3 and /akn/us/act/usc/title9~chp_2 are different URIs and therefore represent DIFFERENT WORKS. What is the relationship between X~chp_2 and X~chp_3? None whatsoever. They are not related, they are not connected, they know nothing of each other. X~chp_2 is not even a legislative document, but a fragment exactly composed of chapter 2. Does it have a preceding chapter? No. Does it have a following chapter? No. It's a monad.
>
> What about X~chp_2__cla_1? Does it exist, too? What is its relationship with X~chp_2? With X? None whatsoever. We end up with an uncontrolled proliferation of WORKS each of which is completely disconnected from the others.
>
> Let me give you another example. Suppose we have a modification act that says something like the following:
>
> ---
>
> Sect 1 - Modification to act 123 of 2014.
> The Act 123 of 2014 is modified as follows:
> 1) Sect. 1 is replaced by the following:
>    "Sect 1: .... "
> 2) At the end of sect. 2 the following words "xxx" are appended.
> 3) Clause 2 of sect. 3 is suppressed.
>
> ---
>
> In case of replacement of the has with some other character, this section corresponds to the modification of THREE different works: /akn/xxx/act/2014/123~sect_1, /akn/xxx/act/2014/123~sect_2 and /akn/xxx/act/2014/123~sect_3__cla_2, or maybe /akn/xxx/act/2014/123~sect_3#cla_2. Not only mod elements have to repeat the whole URI every time, but the activeModifications section reports three different works and the references section also three different works.
>
> This is in contrast with common sense and with the human readable text that clearly mention three modifications to ONE text, and not ONE modification each to THREE texts.
>
> This would clearly build a separation between what the words of the law say and what the operations we perform do. I do not like it. I would very much like that the integrity and conceptual closeness between natural language text and actual operations are maintained.
>
> A way out
> ---------
>
> Fortunately, we are not alone in this situation, and ways out are starting to appear to maintain URI handling and conceptual operations close together.
>
> The technique is called "routing/dispatching", it does not have a wikipedia page yet, but has a history both for server-side management of URI and, more recently, for client-side management to include hash-based URIs.
>
> Server-side:
> Django: https://docs.djangoproject.com/en/dev/topics/http/urls/
> CakePHP: http://book.cakephp.org/2.0/en/development/routing.html
> PHP codeIgniter: http://www.codeigniter.com/user_guide/general/routing.html
> Rails: http://guides.rubyonrails.org/routing.html
>
> Client-side:
> AngularJs: https://docs.angularjs.org/api/ngRoute
> MeteorJs: https://github.com/EventedMind/iron-router
> jQuery: http://xoxco.com/projects/code/router/ ,
>         https://github.com/camme/jquery-router-plugin ,
>         https://github.com/iSimonWeb/jQuery-Router,
>         etc.
> Backbone.js: http://backbonetutorials.com/what-is-a-router/
> No-dependencies: http://millermedeiros.github.io/crossroads.js/
>
> What is routing? Simply put, it means intercepting the request for a URI (as specified in the DOM of a document displayed in the browser) and executing arbitrary procedures (including accessing a completely different URI) according to application-specific logic. Thus, the DOM may contain a reference to /a/b/c#d, the user clicks on the link and the routing application actually loads /z/w/x.
>
> Why is this useful? Because it maintains the illusion of reasonable and meaningful URIs in the DOM while the application's working requires much uglier and 'practical' URIs. This is very useful for REST APIs, which requires beautiful and meaningful URIs to be acceptable. Let me give you an example which is somewhat close to our needs.
>
> Suppose you have a REST API that exposes, say, customers through a URI such as
>
> domain.com/customers/                (returns the list of customers)
> domain.com/customers/rec1321         (returns the record for customer 'rec1321')
>
> Suppose we have 4 million customers. Plainly accessing domain.com/customers/ would therefore require the browser to wait for a list of 4 M items, just to show a handful. That is clearly absurd. Therefore the API provides additional parameters (say, start and count), so that you can make requests such as domain.com/customers/?start=0&count=100 to get only the first 100 items of the customers' list.
>
> Now what happens if I want to scroll near item 1321? I will NOT use a URI such as domain.com/customers/rec1321, because that does not return a list, but a single record. I could use a URI such as domain.com/customers/?start=1300&count=100#rec1321, but that would mean messing up considerably the content of the DOM to handle appropriately every possible scrolling request, to determine exactly the 100 items to load for each possible situation. Nonsense. What I want is to use a URI such as domain.com/customers/#rec1321, plain and simple: get the list of customers, and scroll it to rec1321.
>
> All routing platform that I mentioned before are able to intercept a request for URIs such as these and replace them with practical URIs. The server, the browser and the HTTP connection are all part of the conspiracy, because the actual URI domain.com/customers/#rec1321 is never actually sent over the wire, but both the DOM (i.e., the document as stored) and the user (i.e., the document as displayed) contains and use only the nice and meaningful URI, and not the ugly one.
>
> This is like a good magic show: when we see the magician floating over the stage without support, we all rationally know there is a wire somewhere he is hanging from, but we DO NOT WANT  to see the wire, and seeing it would mean ruining the illusion and destroying the credibility of the magician. Wires need to be kept hidden and unseen.
>
> Instead, by specifying hash-free URIs in the documents (in the XMLs!!! For ever!!!) we are subjecting our documents, and our users, to the quirkiness of over-abundant Work URIs of disconnected documents, we are forcing our semantic tools to work with multiple URIs for the same documents (we have to explicitly assert that by modifying /akn/xxx/act/2014/123~sect_2__cla_1 we also modified /akn/xxx/act/2014/123~sect_2 and /akn/xxx/act/2014/123, which is not obvious) we are destroying the user's illusion of a single, long document which can be navigated by clicking on links and scrollbars, we are not only displaying our wires, we are painting them yellow.
>
> No, I can't say I like such idea. Especially considering that there are alternative solutions perfectly working.
>
> More than happy to illustrate routing in greater details during the teleconf and in subsequent chats and mail, if the need arises. I might be slow and intermittent, but I will try to my best to avoid going through these route.
>
> Ciao
>
> Fabio
>
> --
>
>
> On 11/nov/2014, at 15:36, monica.palmirani <monica.palmirani@unibo.it> wrote:
>
>> Dear Fabio,
>>
>> it is fantastic! The next TC meeting will be Nov. 12th 18.30-19.30 CET.
>>
>> The main topic will be the opportunity of using '~' instead of '#' in the AKOMA NTOSO URI.
>>
>> Example:
>> /akn/us/act/usc/title9/eng@2014-10-10/main.akn#chp_3
>>
>> to arrive to this:
>> /akn/us/act/usc/title9/eng@2014-10-10/main.akn~chp_3
>>
>> This could favour the communication between client and server using HTTP in order to manage also the fragment information and also the <fragment> docType of Akoma Ntoso.
>>
>> Yours,
>> Monica
>>
>>
>> Il 11/11/2014 14:26, Fabio Vitali ha scritto:
>>> Dear all,
>>>
>>> sorry for my silence of the past few weeks, I had some unforeseen health issue that I had to take care of and that have taken my whole attention for months now. Now I am slowly healing and while I don't think I am ready to take on the full weight of my duties, some of them are important enough for me to try to get beck onto them as soon as possible.
>>>
>>> I would also like to thank Monica to have taken the load of my absence and to have kept me updated and informed (for the little that my mind could absorb in those weeks) about the topics being discussed.
>>>
>>> I know for instance that a discussion has started about the opportunity of using '_' instead of '#'to separate document and fragment in Akoma Ntoso URIs.
>>>
>>> I haven't been able to find out anything about the rationale behind it in the mailing list, but I'll try to send you a brief note about my thoughts on the topic in the next few hours.
>>>
>>> Ciao
>>>
>>> Fabio
>>>
>>>
>>>
>>> --
>>>
>>> Fabio Vitali                            Tiger got to hunt, bird got to fly,
>>> Dept. of Computer Science        Man got to sit and wonder "Why, why, why?'
>>> Univ. of Bologna  ITALY               Tiger got to sleep, bird got to land,
>>> phone:  +39 051 2094872              Man got to tell himself he understand.
>>> e-mail: fabio@cs.unibo.it         Kurt Vonnegut (1922-2007), "Cat's cradle"
>>> http://vitali.web.cs.unibo.it/
>>>
>>>
>>>
>>>
>>>
>>>
>>> --
>>>
>>> Fabio Vitali                            Tiger got to hunt, bird got to fly,
>>> Dept. of Computer Science        Man got to sit and wonder "Why, why, why?'
>>> Univ. of Bologna  ITALY               Tiger got to sleep, bird got to land,
>>> phone:  +39 051 2094872              Man got to tell himself he understand.
>>> e-mail: fabio@cs.unibo.it         Kurt Vonnegut (1922-2007), "Cat's cradle"
>>> http://vitali.web.cs.unibo.it/
>>>
>>>
>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe from this mail list, you must leave the OASIS TC that
>>> generates this mail.  Follow this link to all your TCs in OASIS at:
>>> https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php
>>>
>>>
>>
>>
>> --
>> ===================================
>> Associate professor of Legal Informatics
>> School of Law
>> Alma Mater Studiorum Università di Bologna
>> C.I.R.S.F.I.D. http://www.cirsfid.unibo.it/
>> Palazzo Dal Monte Gaudenzi - Via Galliera, 3
>> I - 40121 BOLOGNA (ITALY)
>> Tel +39 051 277217
>> Fax +39 051 260782
>> E-mail  monica.palmirani@unibo.it
>> ====================================
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe from this mail list, you must leave the OASIS TC that generates this mail.  Follow this link to all your TCs in OASIS at:
>> https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php
>
>
>
> --
>
> Fabio Vitali                            Tiger got to hunt, bird got to fly,
> Dept. of Computer Science        Man got to sit and wonder "Why, why, why?'
> Univ. of Bologna  ITALY               Tiger got to sleep, bird got to land,
> phone:  +39 051 2094872              Man got to tell himself he understand.
> e-mail: fabio@cs.unibo.it         Kurt Vonnegut (1922-2007), "Cat's cradle"
> http://vitali.web.cs.unibo.it/
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe from this mail list, you must leave the OASIS TC that
> generates this mail.  Follow this link to all your TCs in OASIS at:
> https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php
>
>



-- 
____________________________________________________________________
Grant Vergottini
Xcential Group, LLC.
email: grant.vergottini@xcential.com
phone: 858.361.6738
Follow-Ups:
- Re: [legaldocml] On the replacement of '~' for '#' in Akoma Ntoso URIs.
  - From: Fabio Vitali <fvitali@gmail.com>
References:
- I'm back...
  - From: Fabio Vitali <fvitali@gmail.com>
- Re: [legaldocml] I'm back...
  - From: monica.palmirani <monica.palmirani@unibo.it>
- On the replacement of '~' for '#' in Akoma Ntoso URIs.
  - From: Fabio Vitali <fvitali@gmail.com>