OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

office message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: RE: [office] IRI vs URI Discussion Today (2010-09-13)


Yes, I intentionally used the %62 escape.  However, in URIs there is no
assurance that the 0x62 byte is intended to be the ASCII/ISO 646 encoding
for the letter "b".  That's why, among other reasons, the rule for URIs in
namespace declarations (and in some other cases) says that the namespaces
identified by URIs http://example.com/abc and http://example.com/a%62c are
different.

That's why it is important to urge that producers SHOULD NOT %-encode the
UTF8 encoding of any Basic Latin Characters that are freely-usable in URIs
without any escaping and that consumers SHOULD NOT decode any %-encoding
within IRIs in the markup of a consumed document.  

 - Dennis

FURTHER THOUGHTS

One cannot prevent the use of Basic Latin Characters that are not
freely-usable in URIs because they may be required to express a URI of some
origin.  I would suggest that if such Basic Latin characters must be used,
they always be represented by %-encoding of their one-byte UTF8 codes in all
IRIs that employ them.  I'm not sure if it is appropriate to say that much
in the ODF specification.  (On the other hand, we do have a say in
determining which %-encodings are ever needed in making IRI references to
same-document and same-package resources in ODF documents.)

Here's an odd case.  If for some reason the URI mapping of a non-URI IRI is
provided as the value of a markup item whose datatype is anyURI, no
%-encoding in it should be decoded in submission to a URI/IRI resolver.  In
deciding if two IRIs are the same or not, it is probably appropriate to map
them both to URIs and see if those are the same.  (The mapping should do
something rational for those parts of URIs that are not case-sensitive, such
as the letters for hexadecimal digits in a %-encoding.)  

I am tempted to say in regard to the consumption of ODF documents that
mapping to URIs MAY always be done before submission to a resolver, whether
or not IRIs are directly acceptable to the resolver.  Something tells me
this is a natural consequence of the way mapping of IRIs to URIs is defined,
but I am not 100% certain of that at this point.  I can't imagine an
interoperable case without this assurance, however.

 - Dennis

-----Original Message-----
From: Andreas J. Guelzow [mailto:andreas.guelzow@concordia.ab.ca] 
Sent: Monday, September 13, 2010 08:49
To: dennis.hamilton@acm.org
Cc: ODF TC List
Subject: Re: [office] IRI vs URI Discussion Today (2010-09-13)

On Mon, 2010-09-13 at 09:00 -0600, Dennis E. Hamilton wrote:
> It is conceivable (and permissible) that IRI references
> "abc" and "a%62c" resolve to different resources (since they are different
> URIs).   

According to my reading of RFC 2396, this may not be correct. "%62" is
the escaped encoding (in the sense of RFC 2396 2.4.1) of the character
b.  Note specifically in 2.4.2: 
Because the percent "%" character always has the reserved purpose of
being the escape indicator, it must be escaped as "%25" in order to
be used as data within a URI.

[That doesn't really mean that everybody does this right. A little test
showed me that firefox does not consider them the same in the  <first>
part of the URI but the Apache server accessed seems to be happy with it
in the <second> portion of the URI.)



Andreas
-- 
Andreas J. Guelzow
Concordia University College of Alberta


---------------------------------------------------------------------
To unsubscribe from this mail list, you must leave the OASIS TC that
generates this mail.  Follow this link to all your TCs in OASIS at:
https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php 



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]