OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

office message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: [OASIS Issue Tracker] Issue Comment Edited: (OFFICE-2685)NEEDS-DISCUSSION: ODF 1.2 Part 3 "IRI" and "relative IRI" used throughoutare never defined



    [ http://tools.oasis-open.org/issues/browse/OFFICE-2685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=20625#action_20625 ] 

Dennis Hamilton edited comment on OFFICE-2685 at 8/15/10 3:29 PM:
------------------------------------------------------------------

In discussion on OFFICE-3342, I mentioned something that is very important here.

IRI encoding of URIs is a means of using the limited, single-byte character set of URIs to encode characters that are not freely chooseable (including all characters outside of the Basic Latin graphical character set) by using URI encoding in a paritcular way (which we should probably refer to as IRI encoding).  URI encoding applies to single bytes and one has to know that the IRI convention applies so that a sequence of one or more encodings is properly understood to comprise URI-encodings of the individual bytes of a UTF-8 encoding for the intended Unicode code point (and not single bytes of some other character-set encoding, for example).

The principle is that a URI, itself, never contains characters that are not among those limited ones permitted for discretionary use in a particular part of the URI.

Since there are many different relative IRI references (In this sense)  that can be used to refer to the same package file, it is important to ensure that those are supported by how package-file names are carried in manifest:full-path entries and the corresponding "file name" entries used in the Zip package itself (when there is a corresponding package file).  

1. For example, it is useful to recommend that no part of a manifest:full-path entry that corresponds to the first segment of a relative IRI that might be used to refer to it have a ":" character in its text.  This makes it easier to create path-noscheme forms for relative references that start at any segment that will correspond to a corresponding segment portion of some manifest:full-path form.  The reason we need some certainty around this is because there are too many ways to work around this, and there is no way to no a priori which practice applies:

 1.1. One workaround is to %-encode the ":" in the relative IRI reference so that it qualifies as a path-noscheme, but that leaves open the question of whether (1.1.1) matching to a manifest:full-path requires decoding first or (1.1.2) whether the ":" is %-encoded in manifest:full-path (and the name in the Zip directories for any package files having such segment forms as well).

 1.2. Another workaround is to always start a relative IRI reference with a"." segment so that it will always be a path-noscheme relative IRI regardless of whether or not there is a ":" in any corresponding segment of a manifest:full-path value which the IRI has as its first segment after "." and ".." segment elimination.. 

2. An additional consideration for IRI encoding all the way into the manifest:full-path is that some of the non-discretionary Basic Latin characters may also fail to be valid in segments of file system names where the ODF Package might be extracted into or imported from.   Although there is no requirement that ODF packages be extractable into any particular (hierarchical or non-hierarchical) file systems, it is probably wise to secure the potential for such operations as neutrally as possible by leaving URI encodings alone and keeping them in the manifest:full-path and the name used for a corresponding package file in the Zip archive.  (This is a great reason for encoding all ":" occurrences in the equivalent of URI path segments all of the time.)  How Zip tools make friendly use of particular file systems in extracting such forms and whether they import in a way friendly to relative-IRI referencing (something far outside the scope of Zip specificatiions) is a problem that the OpenDocument Package does not have to solve.  (Producers of ODF packages will have to be cautious when using Zip utilities to import package files and creating the appropriate manifest entries, however.  Producers using such approaches have to ensure that all package file names will pass muster as targets for IRI-encoded relative URI references.)

3. Finally, it is generally understood that the names for files in Zip archives employ single-byte character-set encodings, and this is acknowledged as the custom in recent descriptions of the ZIp format from PKWare  In addition, the common default single-byte character repertoire has the ASCII form of the Basic Latin character set as a subset.  So there is good reason to ensure maximum portability by taking the IRI encoding all the way into the package file name and have the manifest:full-path and the package file name be identical, including having IRI encoding in their respective forms.

4. Reconciliation of the Package rules along the lines of (1-3) would certainly clean up a great deal.  How much the resolution constitutes a breaking change over what has already happened since the loose provisions of ODF 1.0 and IS 26300 is a different question.  It will be interesting to see how we come up with an explainable way of dealing with that.  Either way, I believe we must be precise about the end-to-end treatment of relative IRI references in ODF Packages.

      was (Author: orcmid):
    In discussion on OFFICE-3342, I mentioned something that is very important here.

IRI encoding of URIs is a means of using the limited, single-byte character set of URIs to encode characters that are not freely chooseable (including all characters outside of the Basic Latin graphical character set) by using URI encoding in a paritcular way (which we should probably refer to s IRI encoding).  URI encoding applies to single bytes and one has to know that the IRI convention applies so that a sequence of one or more encodings is properly understood to comprise URI-encodings of the individual bytes of a UTF-8 encoding for the intended Unicode code point.

The principle is that a URI, itself, never contains characters that are not among those limited ones permitted for discretionary use in a particular part of the URI.

Since there are many different relative IRI references (In this sense)  that can be used to refer to the same package file, it is important to ensure that those are supported by how package-file names are carried in manifest:full-path entries and the corresponding "file name" entries used in the Zip package itself (when there is a corresponding package file).  

1. For example, it is useful to recommend that no part of a manifest:full-path entry that corresponds to the first segment of a relative IRI that might be used to refer to it have a ":" character in its text.  This makes it easier to create path-noscheme forms for relative references that start at any segment that will correspond to a corresponding segment portion of some manifest:full-path form.  The reason we need some certainty around this is because there are too many ways to work around this, and there is no way to no a prior which practice applies:

 1.1. One workaround is to %-encode the ":" in the relative IRI reference so that it qualifies as a path-noscheme, but that leaves open the question of whether (1.1.1) matching to a manifest:full-path requires decoding first or (1.1.2) whether the ":" is %-encoded in manifest:full-path (and the name in the Zip directories for any package files having such segment forms as well).

 1.2. Another workaround is to always start a relative IRI reference with "./" so that it will always be a path-noscheme relative IRI regardless of whether or not there is a ":" in any corresponding segment of a manifest:full-path value which the IRI has as its first segment after "./" and "../" elimination.. 

2. An additional consideration for IRI encoding all the way into the manifest:full-path is that some of the non-discretionary Basic Latin characters may also fail to be valid in segments of file system names where the ODF Package might be extracted into or imported from.   Although there is no requirement that ODF packages be extractable into any particular (hierarchical or non-hierarchical) file systems, it is probably wise to secure the potential for such operations as neutrally as possible by leaving URI encodings alone and keeping them in the manifest:full-path and the name used for a corresponding package file in the Zip archive.  (This is a great reason for encoding all ":" occurrences in the equivalent of URI path segments all of the time.)  How Zip tools make friendly use of particular file systems in extracting such forms and whether they import in a way friendly to relative-IRI referencing (something far outside the scope of Zip specificatiions) is a problem that the OpenDocument Package does not have to solve.

3. Finally, it is generally understood that the names for files in Zip archives employ single-byte character-set encodings, and this is acknowledged as the custom in recent descriptions of the ZIp format from PKWare  In addition, the common default single-byte character repertoire has the ASCII form of the Basic Latin character set as a subset.  So there is good reason to ensure maximum portability by taking the IRI encoding all the way into the package file name and have the manifest:full-path and the package file name be identical, including having IRI encoding in their respective forms.

4. Reconciliation of the Package rules along the lines of (1-3) would certainly clean up a great deal.  How much the resolution constitutes a breaking change over what has already happened since the loose provisions of ODF 1.0 and IS 26300 is a different question.  It will be interesting to see how we come up with an explainable way of dealing with that.  Either way, I believe we must be precise about the end-to-end treatment of relative IRI references in ODF Packages.
  
> NEEDS-DISCUSSION: ODF 1.2 Part 3 "IRI" and "relative IRI" used throughout are never defined
> -------------------------------------------------------------------------------------------
>
>                 Key: OFFICE-2685
>                 URL: http://tools.oasis-open.org/issues/browse/OFFICE-2685
>             Project: OASIS Open Document Format for Office Applications (OpenDocument) TC
>          Issue Type: Bug
>          Components: Packaging, Part 3 (Packages), Schema and Datatypes
>    Affects Versions: ODF 1.2 CD 05
>         Environment: This applies to all versions of OpenDocument-v1.2-part3-cd1 through -rev04.  It also impacts ODF 1.2 Part 1 and ODF 1.2 Part 2 wherever anyURI or equivalent datatype is used in a relative reference.
>            Reporter: Dennis Hamilton
>            Assignee: Dennis Hamilton
>             Fix For: ODF 1.2 CD 06
>
>
> The terms "IRI" and "relative IRI" are used throughout ODF 1.2 Part 3.  
> 1. There are no definitions offered for these terms.
>  2. There are no references (normative or otherwise) to sources on this terminology (although [RFC3987] would appear to be a good choice).
>  3. The manner in which IRIs are to be mapped to URIs in those places where resolution involves URI segments within a package is not defined, nor is the relationship to IRI-encoding in URIs accounted for in the definition of (a) the manifest:full-path attribute or (b) the format for names of files in the ZIP data units that carry files within the ZIP archive.  In particular, there should be attention to [RFC3987] sections 6.2-6.4, not merely 6.5.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://tools.oasis-open.org/issues/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]