OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

legalcitem-technical message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [legalcitem-technical] Who's easy on us using a US use case?


To pitch into the mix from my end, I've read through this thread, and the framework all seems clear. I do still think we would benefit from examples of what a "reference", "identifier" and "locator" might look like. I realize that these are precise abstractions, and that there are multiple forms in which each of them might be cast. So with the qualification that any examples and the core concepts that they illustrate are not to be confused, it might be useful to have a snapshot of each "on paper", as it were.

We all get the structures and workflows, I think: it's more that there is a linguistic traffic jam in this area, and examples will help me know that what you mean that I thought that I said is that same as what I say that you think that I meant. Sort of.

If no obvious examples come to mind, I'll be happy to spin up my own amateur impression of the three.

Frank




On Sat, May 3, 2014 at 2:38 AM, Grant Vergottini <grant.vergottini@xcential.com> wrote:
Hi Fabio, and all,

I like the idea of using the Obamacare as the example - especially as the lead attorney at the U.S. House for the drafting of ACA and the expert in that piece of legislation also happens to be the sponsor of the LLL project - linking legislation is his pet project. He might be able to shed some light on the nuances when we need them.

The identifier scheme that you describe is essentially what we are using - your inspiration is almost exactly what I started with. Committee-think within our project did stray from that model precisely though - but the differences are minor.

We do not embed the GPO name in the identifier. It's of no consequence to us who produced the PDF - just that it exists. Whether it's resolved using a locator pointing to the GPO or some other location is the job of the resolver to figure out. The role of the GPO is transparent to our indetifier scheme.

There are no translations into other languages. I think it would be very controversial if the government were to start translating legislation into Spanish.

Regarding metadata and data, my rule is quite simple. The text of the document, as it is understood to read by the authors and by readers of a presentation form the document, is data. Everything else is metadata. Data is always text content, always is retained in the order in which it appears in the document (for amending/redlining), and is never generated text (meaning it exists in the text content rather than being added for presentation). If text is extracted from the data for whatever reason, that extraction is metadata.

Regards,
   Grant


On Fri, May 2, 2014 at 1:54 AM, Fabio Vitali <fabio@cs.unibo.it> wrote:
Dear all,

last call's discussion let me uneasy on many issues that I thought could be straightened more easily.

To help with the future discussions, I propose we work on a specific use case. Since Grant mentioned Obamacare, which has a really good Wikipedia page to peruse, I will use it as an example.

I propose that, both in our discussions and in comparing approaches and facts, we use this document as example.

Obamacare
---------
This is what I know about Obamacare. Most of the information come from its page on Wikipedia (http://en.wikipedia.org/wiki/Patient_Protection_and_Affordable_Care_Act):

Facts
-----
Obamacare is an US act titled "An act entitled The Patient Protection and Affordable Care Act", short title: "The Patient Protection and Affordable Care Act", acronyms "PPACA" and "ACA", nicknames: "Affordable Care Act", "Health Insurance Reform", "Healthcare Reform", "Obamacare". It is in English. I looked for a version in Spanish or in another language, but could not find any, even non authoritative (although I found fact sheets and summaries in Spanish, Chinese, Vietnamese, etc.). It was signed into law by president Barack Obama on January 23rd, 2010 and became effective the following day.

It was numbered Public Law #148 of the 111th Congress, and was published as pages 119 through 1025 of volume 124 of the Statutes at large. It has been codified in a scattered form inside title 26 (Internal revenue Code) and title 42 of the US Code. It is the enactment of the House of Representatives' Bill # 3590 introduced to the House by Charles Rangel (member of the Democratic Party for the state of New York) on September 17, 2009.

It has been amended several times including by the Health Care and Education Reconciliation Act of 2010, Public Law #152 of the 111th Congress, pages 1029 through 1084 of the 124th volume of the Statutes at Large, signed into law on March 23, 2010 and effective the following day.

I have found several instances of it in various formats. Some of them are, for instance:
[1] http://www.gpo.gov/fdsys/pkg/PLAW-111publ148/pdf/PLAW-111publ148.pdf in PDF,
[2] http://housedocs.house.gov/energycommerce/ppacacon.pdf of the consolidated version after Public Law 111-152, as prepared by the Office of the Legislative Council,
[3] http://democrats.senate.gov/pdfs/reform/patient-protection-affordable-care-act-as-passed.pdf as PDF,
[4] http://beta.congress.gov/111/plaws/publ148/PLAW-111publ148.htm (plain text masquerading as HTML),
[5] http://beta.congress.gov/111/bills/hr3590/BILLS-111hr3590enr.pdf (as PDF), etc.
[6] http://www.autismspeaks.org/images/advocacy/PPACA.pdf,
[7] http://en.wikisource.org/wiki/Patient_Protection_and_Affordable_Care_Act and other pages (as HTML),
[8] http://www.complianceweek.com/s/documents/PPACAText.pdf (as PDF), etc.

Some of them seem to be copies of the same original file in different locations, e.g., 5, 6 and 8, but I did not check thoroughly.

Analysis of features
--------------------
Locations: There are several locations where I can find a copy of the document.
Formats: At least four different formats I could identify: two types of PDF, and two types of HTML.
Format authors: each of the two PDF formats and each of the two HTML formats has been created in a different way by a different author in a different moment
Versions: there are at least two versions of the document, the original version and the consolidation of the amendments introduced by Health Care and Education Reconciliation Act of 2010.
Language: only English
Volume of the Statute: 124
Starting page of the volume of the Statute: 119
Ending page of the volume of the Statute: 1025
Congress: 111th
Public Law number of the corresponding congress: 148
Full title: An act entitled The Patient Protection and Affordable Care Act
Short title: The Patient Protection and Affordable Care Act
Acronyms: PPACA and ACA
Popular names or nicknames: "Affordable Care Act", "Health Insurance Reform", "Healthcare Reform", "Obamacare"
Date of effectivity: January 24th, 2010
Date of signature: January 23rd, 2010
Date of effectivity of amended version: 24 March 2010
Signee: Barack Obama, President of the United States
Type of document: act
Country: United States of America (USA)
Enactment of:
   type of document: bill
   house of first introduction: House of Representatives
   Introduction date: September 17th, 2009
   Internal number: 3590
   introduced by:
      name: Charles Rangel
      party: Democratic Party
      representing: New York


Each of these pairs I call a "feature". If we split them according to the FRBR levels, we find that:

Item: Locators
Manifestation: Format and Format author
_expression_: Version Date, Language,
Work: all the others.

Identifiers
-----------
All Locators are obviously identifiers, but they identify a specific file on a specific machine, rather than a document. For instance, I am pretty confident that 5, 6 and 8 are identical, but they have different locations.

If we want Work level, _expression_ Level and Manifestation Level identifiers, we need to build them with the features we have. There are several combinations of features that give unicity, but some of them are more "natural" than others: for instance, country + congress# + plaw#, or country + volume + starting page, or country + acronym, or country + short title.

There are no reasons to accept identifiers using some features and discard others using other features. I propose that it is possible to create multiple identifiers for documents, provided that they are univocal, using a wide variety of features.

For instance, Akoma Ntoso accepts any work-level identifier organized as such:
/[country]/[doctype]/[doc-subtype]/[date]/[numberOrString]/

Therefore, using the syntax of the Akoma Ntoso Naming Convention, each of the following is a valid Work-level identifier:

a) /us/act/2010/111-148/                                   and      /us/act/2010-01-24/111-148/
b) /us/act/2010/124Stat119/                                and      /us/act/2010-01-24/124Stat119/
c) /us/act/2010/124Stat119-1025/                           and      /us/act/2010-01-24/124Stat119-1025/
d) /us/act/2010/ACA/                                       and      /us/act/2010-01-24/ACA/
e) /us/act/2010/PPACA/                                     and      /us/act/2010-01-24/PPACA/
f) /us/act/2010/ThePatientProtectionAndAffordableCareAct/  and      /us/act/2010-01-24/ThePatientProtectionAndAffordableCareAct/
g) /us/act/2010/ObamaCare/                                 and      /us/act/2010-01-24/ObamaCare/

etc.

Akoma Ntoso adds language, version date (or a simple @ for the original version) and consolidation author to any work-level id. Each of the following is therefore a valid _expression_-level identifier:

h) [WORK-LEVEL-IDENTIFIER]/en@                  -- original version
i) [WORK-LEVEL-IDENTIFIER]/en@2010-01-24        -- original version
j) [WORK-LEVEL-IDENTIFIER]/en@2010-03-24        -- amended version
k) [WORK-LEVEL-IDENTIFIER]/en@2010-03-24/OLC    -- amended version as consolidated by the Office for Legislative Council

Akoma Ntoso adds format and manifestation author to any _expression_-level id. Each of the following is therefore a valid Manifestation-level identifier:

l) [_expression_-LEVEL-IDENTIFIER].pdf            -- a PDF version
m) [_expression_-LEVEL-IDENTIFIER].html           -- an HTML version
n) [_expression_-LEVEL-IDENTIFIER]/GPO.pdf        -- the PDF version created by the Government Printing Office
n) [_expression_-LEVEL-IDENTIFIER]/GPO.html       -- the HTML version created by the Government Printing Office

In my mind, the resolution is composed of two steps: completion (a higher level identifier is completed of feature values to get to a full manifestation-level identifier) and resolution (a manifestation-level identifier is mapped onto a physical item-level URL).

Finally, I still do not see the point with the distinction between data and metadata that came out in our discussion. Can someone, using these data, make me an example on which is data and which is metadata, and why?

Thanks

Fabio

--

Fabio Vitali                            Tiger got to hunt, bird got to fly,
Dept. of Computer Science        Man got to sit and wonder "Why, why, why?'
Univ. of Bologna  ITALY               Tiger got to sleep, bird got to land,
phone:  +39 051 2094872              Man got to tell himself he understand.
e-mail: fabio@cs.unibo.it         Kurt Vonnegut (1922-2007), "Cat's cradle"
http://vitali.web.cs.unibo.it/





---------------------------------------------------------------------
To unsubscribe from this mail list, you must leave the OASIS TC that
generates this mail.  Follow this link to all your TCs in OASIS at:
https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php





--
____________________________________________________________________
Grant Vergottini
Xcential Group, LLC.
email: grant.vergottini@xcential.com
phone: 858.361.6738



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]