OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

legaldocml message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [legaldocml] On the replacement of '~' for '#' in Akoma Ntoso URIs.


Dear Fabio,

thanks a lot for your so clear explanation.

In meantime that Grant reply to your email, I would like to ask you two things:

1. how to use manifestation URI as physical file name in case of multiple fragmented physical files?

In several system that we are managing (Uruguay, Camera, Senate, Cassazione) the manifestation URI IS the physical file name in the repository (eXist, Oracle, virtuoso).

For this reason I am proposing the following annotation ONLY for the manifestation level and ONLY for the docType <fragment>:

/akn/us/act/title42/eng@2014-06-13/~sect_1220-sect_1230/main.xml

I am proposing to include "~sect_1220-sect_1230" eng@2014-06-13, after the expression information, as manifestation information.

~sect_1220-sect_1230 data are thought at manifestation level and not at work or expression because the numbering of the section can change over time, version by version.

Example:
We can have for the work title42 in force at 2014-06-13 this fragmentation:
/akn/us/act/title42/eng@2014-06-13/~sect_1220-sect_1230/main.xml
/akn/us/act/title42/eng@2014-06-13/~sect_1231-sect_1254/main.xml
/akn/us/act/title42/eng@2014-06-13/~sect_1255-sect_1266/main.xml

In a second version (expression) in force at 2014-10-13, due to for example the insertion of a new chapter very large, we could have a different fragmentation:

/akn/us/act/title42/eng@2014-10-13/~sect_1220-sect_1230/main.xml
/akn/us/act/title42/eng@2014-10-13/~sect_1231-sect_1240/main.xml
/akn/us/act/title42/eng@2014-10-13/~sect_1241-sect_1250/main.xml
/akn/us/act/title42/eng@2014-10-13/~sect_1251-sect_1266/main.xml

What do you think about this proposal at manifestation level?

2. Secondly I am asking if it is possible to add a syntax query that permits to save the information for the navigation purposes:

In your proposal the request is:
/akn/us/act/title42~sect_1220-sect_1230#sect_1224
It supposes that the end user that would like to cite sect_1224 of the USC of Grant (e.g. judge in High Court judgment) knows perfectly the subdivision of the <fragment> files made by Grant.

What about to use simply a redundant "query syntax" for the complex cases (fragmented work) or for those resolvers that work only server side (e.g. EU Parliament resolver)?

<ref href="/akn/us/act/title42~sect_1233/main.xml#sect_1233">

In this manner each authority could decide if to use simple HTTP URI
<ref href="/akn/us/act/title42/main.xml#sect_1233">
or "query" approach with redundancy of information
<ref href="/akn/us/act/title42~sect_1233/main.xml#sect_1233">

Yours,
Monica

Il 14/11/2014 11:06, Fabio Vitali ha scritto:
Dear Grant, all:

I think that I am better understanding the problem now, and that I feel your need. Also, I believe that my previous mail, just like this one, are not "final statements" of my position,   but progressively improving approaches to a reasonable solution acceptable to all.

This is what I do NOT want at all costs: an endless proliferation of Works and Expressions just because Joe Schmo needs to refer to an individual proposition within a larger document.

What I am willing to accept: view-oriented request URIs that allow the server to better focus the response to what is needed by the requesting agent.

In fact, we already have at least ONE type of view URI: the view date element. When I am making a request such as /akn/us/act/title14/en:2014-09-01, I am asking for the version that was in force on 2014-09-01, which often is not a meaningful date for the document at all, and which would get as an answer, say, /akn/us/act/title14/en@2014-05-21, where 2014-05-21 is the date in which the latest modification before 2014-09-01 actually entered in force.

Thus a view-oriented URI is a REQUEST-ONLY URI whose response will be a URI with matching but not IDENTICAL characteristics. View-oriented URIs are NEVER returned by servers and do NOT correspond exactly to Works or Expressions or whatever.

Given this, we could devise a view-oriented URI for partial response, in which the server decides to return NOT the whole document, but only a fragment.

The syntax is open to discussion, we could keep the '~' character if it is natively allowed by URI syntax, but the basic working could be the following:

REQUEST: /akn/us/act/title42~sect_1220-sect_1230#sect_1224
Interpretation: Dear server, give me the Work numbered "title 42" of the acts of US, and (at your whim) you can restrict the response to only the interval of fragments between sect 1220 and sect 1230.
Important notes: a) the Work being requested is STILL the whole title 42, which is the only Work involved. b) The server may decide, having the capability to do so, to return NOT the whole title 42, but a fragment thereof, which could be exactly as requested or another fragment that is compatible with the request.

RESPONSE: A response is ALWAYS a manifestation. This manifestation is the best representation that the server could build of the request URI. At the server's discretion, it can be a fragment with the exact interval requested, a fragment with a larger interval, or the full Work if the server can't do fragments. For instance, the following could be a reasonable answer to the request /akn/us/act/title42~sect_1220-sect_1230 :

<?xml version="1.0" encoding="UTF-8"?>
<akomaNtoso xmlns="http://docs.oasis-open.org/legaldocml/ns/akn/3.0/CSD11";>
   <fragment includedIn="/akn/us/act/title42">
     <meta>
       <identification source="#fv">
         <FRBRWork>
           <FRBRthis value="/akn/us/act/title42/main"/>
           <FRBRuri value="/akn/us/act/title42"/>
           <FRBRdate date="1965-01-01" name="creation"/>
           <FRBRauthor href="#congress"/>
           <FRBRcountry value="us"/>
         </FRBRWork>
         <FRBRExpression>
           <FRBRthis value="/akn/us/act/title42/en@2014-06-13/main"/>
           <FRBRuri value="/akn/us/act/title42/en@2014-06-13"/>
           <FRBRdate date="2014-06-13" name="amendment"/>
           <FRBRauthor href="#congress"/>
           <FRBRlanguage language="en"/>
         </FRBRExpression>
         <FRBRManifestation>
           <FRBRthis value="/akn/us/act/title42/en@2014-06-13/main.xml"/>
           <FRBRuri value="/akn/us/act/title42/en@2014-06-13.akn"/>
           <FRBRdate date="2014-11-13" name="extraction"/>
           <FRBRauthor href="#GV"/>
         </FRBRManifestation>
       </identification>
     </meta>
     <fragmentBody>
       <section eId="sect_1320">
         <num>Section 1320</num>
         <heading>An example</heading>
         <content>
           <p>Some content whatsoever</p>
         </content>
       </section>
       ...
       <section eId="sect_1330">
         <num>Section 1330</num>
         <heading>Another example</heading>
         <content>
           <p>Some more content</p>
         </content>
       </section>
     </fragmentBody>
   </fragment>
</akomaNtoso>

As you notice the fragment claims to be a manifestation of an expression of the work /akn/us/act/title42, and not some strange derived Work. This is Title 42. Only not all of it, as can be deduced by the <fragment> element.

Therefore Joe Schmo can request either a fragment containing only the section he's interested in (as in /akn/us/act/title42~sect_1224 ) OR a larger fragment of the Work that is centered on the section he's interested in (as in /akn/us/act/title42~sect_1220-sect_1230#sect_1224 ) and that can be scrolled back and forth by his readers.

Would this work for you?

Ciao

Fabio

--

On 13/nov/2014, at 10:20, Grant Vergottini <grant.vergottini@xcential.com> wrote:

Dear Fabio,

If I understand your proposal correctly, you're asking to stick to the
notation we currently have.

With this notation, which is based on a URL, the basic URL identifies
the work, and optionally an expression of a work an the manifestation
or format of the returned object.

Beyond this basic URL, the # is used to specify within the work, the
provision of interest, with the identifier (presumably the @eId) being
used following the #.

As this notation, in normal HTTP land, causes the URL to be split,
requesting the entire WORK from the server and then having the client
(browser) either scroll to the provision or otherwise handle it, it
becomes necessary for a client-side JavaScript library to handle the
actual resolution. If I understand this correctly, it means that the
resolver is implemented as a client side JavaScript library. This
client side resolver converts the reference URL into a URL to be sent
to the server and can take care of any performance considerations such
as a paging mechanism or some other form of paring down the request to
ensure you don't return a huge document when all you're really
interested in is a small region of the document.

This model differs tremendously from my world view. In my world view,
the URL is specified as a URL precisely so that it can travel along
the established HTTP mechanisms without any preprocessing. That's the
primary motivation for using a URL rather than a URN. With URLs, it is
entirely possible to refer to a provision using the same URL in a web
page, an email document, a Word document, a PowerPoint presentation,
or in any XML editor. By having no client requirement other than that
it understand the long established HTTP linking mechanisms, we're able
to create a simple referencing model that can be universal.

To this end, I implemented my resolver as a URL handler within the web
server. It's written as a DLL that is dynamically bound to the web
server and is configured to receive all traffic that directs through
URLs starting with the scheme name (in our case, that's "/akn"). So,
rather than having the web server handle the URL by returning a page
in some preconfigured hierarchy, my handler DLL is asked to retrieve
the requested provision, however the underlying data structures are
organized. An entry point into my DLL is called and I am given the URL
to resolve. The DLL reads the URL, does some work to get a response,
and then sends it back to the client. My current storage model is that
each title of the US Code is a separate entire document that is stored
in eXist. After parsing the URL, I determine the appropriate XQuery
necessary to retrieve that specific provision which I then package up
with any necessary envelope and add any necessary metadata before
finally returning the result. This approach allows me to return a
single clause, when that is what is of interest, or perhaps an entire
statutory note that contains an embedded enactment, when that is of
interest. If I want the entire document, I can get that. If I just
want a chapter, I can get that too. I can get anything, no matter the
granularity, as long as it has an identifier to grab onto.

Take a look at Title 42 of the U.S. Code. It is non-positive law - so
it isn't enacted as a title. (this is unlike positive law titles that
are single enactments) Title 42 is no more a single WORK as a Statute
book full of a year's worth of acts stored as Chapters. Actually, just
like a statute book, Title 42 is organized as 152 chapters -- each one
being a separate enactment. These enactments largely have nothing to
do with one another as Title 42 casts a broad net of a number of
mostly unrelated topics. Remember, as non-positive law, the
classification within Title 42 are not the official law -- it's the
public laws recorded in the Statutes that are. But, the chapters shown
within Title 42 will reflect executed amendments while the public laws
will not. Buried within each chapter of Title 42, you will find other
enactments that have been classified within the hierarchy of the top
level enactments. It all becomes a convoluted mess. But, with the
scheme we use, it doesn't matter. We can map any citation into an
equivalent reference and we can dereference any reference to retrieve
the provision. This is all done by the resolver, on the server. It
retrieves the incoming URL which identifies a specific provision, the
version we want, and the format we want returned, and then the
resolver does whatever is necessary to provide the expected return. In
some cases, we must look up information within some tables to handle
ambiguities and the quirks that history has given us. In other cases
we defer to a different server or we retrieve the result from another
server and return it as our own. But we can always answer the question
without ever exposing to the client any details, sometimes quite ugly,
of how we retrieved this information. It could come from flat files
stored as CLOBs in Oracle or from the XML storage in our eXist
database -- nobody has to know how. When we change how we do it,
nobody's references will ever break. The details are entirely
invisible to the client. In short, this works without any burden on
the client other than being able to pass on the necessary information
to the server using HTTP.

The problem with the # notation is that it prevents some of the
information from being passed on to the server -- requiring client
side processing to encode this information in some other unspecified
form before sending it to the server. I just don't understand the
motivation for this -- for me, the need for client-side processing is
a non-starter. It eliminates much of the value of the URL notation and
ensures that all clients be customized to work with the system. That
requirement is too closed and too limiting. Our suggestion was to use
the "~" instead to allow all the information to be passed on. My one
discomfort was that keeping the position at the end of the string was
wrong. I would rather the notation be something like
"/akn/{workIdentifier}[~{portion}][/[{lang}]@{version}][.{format}]".

On Wed, Nov 12, 2014 at 8:29 AM, Fabio Vitali <fvitali@gmail.com> wrote:
Dear all,

a few brief reflections about the idea of introducing yet another section in our URIs, introduce by the character '~' (or whatever) instead of the hash ('#') or the specification of the individual fragment of the reference. I am not commenting on the specific character, but just of the idea of replacing the hash with something else.

The example provided by Monica refers to a MANIFESTATION-level references.

Example:
/akn/us/act/usc/title9/eng@2014-10-10/main.akn#chp_3

to arrive to this:
/akn/us/act/usc/title9/eng@2014-10-10/main.akn~chp_3

I have minor quibbles about the order of things, but in general the idea of replacing the hash in MANIFESTATION-LEVEL references seems to me kind of contrived, but not wrong per se.

At the EXPRESSION and WORK level, on the other hand, the situation is completely different. I am strongly convinced that it is wrong to replace hashes, and that it provides no advantages neither in the short nor in the long term. Consider a WORK-LEVEL identifier such as /akn/us/act/usc/title9~chp_3

The problem lies around the fact that /akn/us/act/usc/title9~chp_3 and /akn/us/act/usc/title9~chp_2 are different URIs and therefore represent DIFFERENT WORKS. What is the relationship between X~chp_2 and X~chp_3? None whatsoever. They are not related, they are not connected, they know nothing of each other. X~chp_2 is not even a legislative document, but a fragment exactly composed of chapter 2. Does it have a preceding chapter? No. Does it have a following chapter? No. It's a monad.

What about X~chp_2__cla_1? Does it exist, too? What is its relationship with X~chp_2? With X? None whatsoever. We end up with an uncontrolled proliferation of WORKS each of which is completely disconnected from the others.

Let me give you another example. Suppose we have a modification act that says something like the following:

---

Sect 1 - Modification to act 123 of 2014.
The Act 123 of 2014 is modified as follows:
1) Sect. 1 is replaced by the following:
   "Sect 1: .... "
2) At the end of sect. 2 the following words "xxx" are appended.
3) Clause 2 of sect. 3 is suppressed.

---

In case of replacement of the has with some other character, this section corresponds to the modification of THREE different works: /akn/xxx/act/2014/123~sect_1, /akn/xxx/act/2014/123~sect_2 and /akn/xxx/act/2014/123~sect_3__cla_2, or maybe /akn/xxx/act/2014/123~sect_3#cla_2. Not only mod elements have to repeat the whole URI every time, but the activeModifications section reports three different works and the references section also three different works.

This is in contrast with common sense and with the human readable text that clearly mention three modifications to ONE text, and not ONE modification each to THREE texts.

This would clearly build a separation between what the words of the law say and what the operations we perform do. I do not like it. I would very much like that the integrity and conceptual closeness between natural language text and actual operations are maintained.

A way out
---------

Fortunately, we are not alone in this situation, and ways out are starting to appear to maintain URI handling and conceptual operations close together.

The technique is called "routing/dispatching", it does not have a wikipedia page yet, but has a history both for server-side management of URI and, more recently, for client-side management to include hash-based URIs.

Server-side:
Django: https://docs.djangoproject.com/en/dev/topics/http/urls/
CakePHP: http://book.cakephp.org/2.0/en/development/routing.html
PHP codeIgniter: http://www.codeigniter.com/user_guide/general/routing.html
Rails: http://guides.rubyonrails.org/routing.html

Client-side:
AngularJs: https://docs.angularjs.org/api/ngRoute
MeteorJs: https://github.com/EventedMind/iron-router
jQuery: http://xoxco.com/projects/code/router/ ,
        https://github.com/camme/jquery-router-plugin ,
        https://github.com/iSimonWeb/jQuery-Router,
        etc.
Backbone.js: http://backbonetutorials.com/what-is-a-router/
No-dependencies: http://millermedeiros.github.io/crossroads.js/

What is routing? Simply put, it means intercepting the request for a URI (as specified in the DOM of a document displayed in the browser) and executing arbitrary procedures (including accessing a completely different URI) according to application-specific logic. Thus, the DOM may contain a reference to /a/b/c#d, the user clicks on the link and the routing application actually loads /z/w/x.

Why is this useful? Because it maintains the illusion of reasonable and meaningful URIs in the DOM while the application's working requires much uglier and 'practical' URIs. This is very useful for REST APIs, which requires beautiful and meaningful URIs to be acceptable. Let me give you an example which is somewhat close to our needs.

Suppose you have a REST API that exposes, say, customers through a URI such as

domain.com/customers/                (returns the list of customers)
domain.com/customers/rec1321         (returns the record for customer 'rec1321')

Suppose we have 4 million customers. Plainly accessing domain.com/customers/ would therefore require the browser to wait for a list of 4 M items, just to show a handful. That is clearly absurd. Therefore the API provides additional parameters (say, start and count), so that you can make requests such as domain.com/customers/?start=0&count=100 to get only the first 100 items of the customers' list.

Now what happens if I want to scroll near item 1321? I will NOT use a URI such as domain.com/customers/rec1321, because that does not return a list, but a single record. I could use a URI such as domain.com/customers/?start=1300&count=100#rec1321, but that would mean messing up considerably the content of the DOM to handle appropriately every possible scrolling request, to determine exactly the 100 items to load for each possible situation. Nonsense. What I want is to use a URI such as domain.com/customers/#rec1321, plain and simple: get the list of customers, and scroll it to rec1321.

All routing platform that I mentioned before are able to intercept a request for URIs such as these and replace them with practical URIs. The server, the browser and the HTTP connection are all part of the conspiracy, because the actual URI domain.com/customers/#rec1321 is never actually sent over the wire, but both the DOM (i.e., the document as stored) and the user (i.e., the document as displayed) contains and use only the nice and meaningful URI, and not the ugly one.

This is like a good magic show: when we see the magician floating over the stage without support, we all rationally know there is a wire somewhere he is hanging from, but we DO NOT WANT  to see the wire, and seeing it would mean ruining the illusion and destroying the credibility of the magician. Wires need to be kept hidden and unseen.

Instead, by specifying hash-free URIs in the documents (in the XMLs!!! For ever!!!) we are subjecting our documents, and our users, to the quirkiness of over-abundant Work URIs of disconnected documents, we are forcing our semantic tools to work with multiple URIs for the same documents (we have to explicitly assert that by modifying /akn/xxx/act/2014/123~sect_2__cla_1 we also modified /akn/xxx/act/2014/123~sect_2 and /akn/xxx/act/2014/123, which is not obvious) we are destroying the user's illusion of a single, long document which can be navigated by clicking on links and scrollbars, we are not only displaying our wires, we are painting them yellow.

No, I can't say I like such idea. Especially considering that there are alternative solutions perfectly working.

More than happy to illustrate routing in greater details during the teleconf and in subsequent chats and mail, if the need arises. I might be slow and intermittent, but I will try to my best to avoid going through these route.

Ciao

Fabio

--


On 11/nov/2014, at 15:36, monica.palmirani <monica.palmirani@unibo.it> wrote:

Dear Fabio,

it is fantastic! The next TC meeting will be Nov. 12th 18.30-19.30 CET.

The main topic will be the opportunity of using '~' instead of '#' in the AKOMA NTOSO URI.

Example:
/akn/us/act/usc/title9/eng@2014-10-10/main.akn#chp_3

to arrive to this:
/akn/us/act/usc/title9/eng@2014-10-10/main.akn~chp_3

This could favour the communication between client and server using HTTP in order to manage also the fragment information and also the <fragment> docType of Akoma Ntoso.

Yours,
Monica


Il 11/11/2014 14:26, Fabio Vitali ha scritto:
Dear all,

sorry for my silence of the past few weeks, I had some unforeseen health issue that I had to take care of and that have taken my whole attention for months now. Now I am slowly healing and while I don't think I am ready to take on the full weight of my duties, some of them are important enough for me to try to get beck onto them as soon as possible.

I would also like to thank Monica to have taken the load of my absence and to have kept me updated and informed (for the little that my mind could absorb in those weeks) about the topics being discussed.

I know for instance that a discussion has started about the opportunity of using '_' instead of '#'to separate document and fragment in Akoma Ntoso URIs.

I haven't been able to find out anything about the rationale behind it in the mailing list, but I'll try to send you a brief note about my thoughts on the topic in the next few hours.

Ciao

Fabio



--

Fabio Vitali                            Tiger got to hunt, bird got to fly,
Dept. of Computer Science        Man got to sit and wonder "Why, why, why?'
Univ. of Bologna  ITALY               Tiger got to sleep, bird got to land,
phone:  +39 051 2094872              Man got to tell himself he understand.
e-mail: fabio@cs.unibo.it         Kurt Vonnegut (1922-2007), "Cat's cradle"
http://vitali.web.cs.unibo.it/






--

Fabio Vitali                            Tiger got to hunt, bird got to fly,
Dept. of Computer Science        Man got to sit and wonder "Why, why, why?'
Univ. of Bologna  ITALY               Tiger got to sleep, bird got to land,
phone:  +39 051 2094872              Man got to tell himself he understand.
e-mail: fabio@cs.unibo.it         Kurt Vonnegut (1922-2007), "Cat's cradle"
http://vitali.web.cs.unibo.it/





---------------------------------------------------------------------
To unsubscribe from this mail list, you must leave the OASIS TC that
generates this mail.  Follow this link to all your TCs in OASIS at:
https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php




--
===================================
Associate professor of Legal Informatics
School of Law
Alma Mater Studiorum Università di Bologna
C.I.R.S.F.I.D. http://www.cirsfid.unibo.it/
Palazzo Dal Monte Gaudenzi - Via Galliera, 3
I - 40121 BOLOGNA (ITALY)
Tel +39 051 277217
Fax +39 051 260782
E-mail  monica.palmirani@unibo.it
====================================


---------------------------------------------------------------------
To unsubscribe from this mail list, you must leave the OASIS TC that generates this mail.  Follow this link to all your TCs in OASIS at:
https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php



--

Fabio Vitali                            Tiger got to hunt, bird got to fly,
Dept. of Computer Science        Man got to sit and wonder "Why, why, why?'
Univ. of Bologna  ITALY               Tiger got to sleep, bird got to land,
phone:  +39 051 2094872              Man got to tell himself he understand.
e-mail: fabio@cs.unibo.it         Kurt Vonnegut (1922-2007), "Cat's cradle"
http://vitali.web.cs.unibo.it/





---------------------------------------------------------------------
To unsubscribe from this mail list, you must leave the OASIS TC that
generates this mail.  Follow this link to all your TCs in OASIS at:
https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php





--
____________________________________________________________________
Grant Vergottini
Xcential Group, LLC.
email: grant.vergottini@xcential.com
phone: 858.361.6738

---------------------------------------------------------------------
To unsubscribe from this mail list, you must leave the OASIS TC that
generates this mail.  Follow this link to all your TCs in OASIS at:
https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php




--

Fabio Vitali                            Tiger got to hunt, bird got to fly,
Dept. of Computer Science        Man got to sit and wonder "Why, why, why?'
Univ. of Bologna  ITALY               Tiger got to sleep, bird got to land,
phone:  +39 051 2094872              Man got to tell himself he understand.
e-mail: fabio@cs.unibo.it         Kurt Vonnegut (1922-2007), "Cat's cradle"
http://vitali.web.cs.unibo.it/





---------------------------------------------------------------------
To unsubscribe from this mail list, you must leave the OASIS TC that
generates this mail.  Follow this link to all your TCs in OASIS at:
https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php

.



--
===================================
Associate professor of Legal Informatics
School of Law
Alma Mater Studiorum Università di Bologna
C.I.R.S.F.I.D. http://www.cirsfid.unibo.it/
Palazzo Dal Monte Gaudenzi - Via Galliera, 3
I - 40121 BOLOGNA (ITALY)
Tel +39 051 277217
Fax +39 051 260782
E-mail  monica.palmirani@unibo.it
====================================



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]