security-services message

Subject: [security-services] RE: [saml-dev] Encoding of URI in"Alternative SAML Artifact Form at"
From: "Mishra, Prateek" <pmishra@netegrity.com>
To: 'Yuji Sakata' <sakatayu@nttdata.co.jp>,"Kremp, Juergen" <juergen.kremp@sap.com>, saml-dev@lists.oasis-open.org
Date: Thu, 05 Dec 2002 16:26:53 -0500
Hi Yuji, Juergen:

You have pointed out a missing piece in the non-normative
"Alternative Format" in the binding doc. 

Would it be adequate to specify UTF-8 as the 
default charset to be used here?


- prateek

>>
>>> chapter 9 of the Bindings document introduces an 
>>alternative format for the
>>>  Assertion Artifact:
>>> 
>>> TypeCode          := 0x0002
>>> RemainingArtifact := AssertionHandle SourceLocation
>>> AssertionHandle   := 20-byte_sequence
>>> SourceLocation    := URI
>>> 
>>> To create the artifact, Base64 is to be applied to the 
>>concatenation of 
>>> TypeCode and RemainingArtifact.
>>> Base64 uses Bytes as input.
>>> 
>>> The specification does not specify how to convert the 
>>character-like URI 
>>> into bytes. 
>>
>>The following resources may be helpful for you
>>http://www.ietf.org/rfc/rfc2396.txt
>>2.1 URI and non-ASCII characters
>>
>>I also agree that SourceLocation's URI should  require a 
>>single charset, 
>>define a default charset, or provide a way to indicate the  
>>charset used.
>>
>>
>>---------------------------------------------
>>NTT Data Corporation
>>Yuji Sakata
>>Tel: +81-3-3523-8081
>>E-Mail: sakatayu@nttdata.co.jp
>>----------------------------------------------
>>--------------------------------------------------------------
>>----------
>>-------------
>>
>> RFC 22396 :: 2.1 URI and non-ASCII characters
>> 
>>   The relationship between URI and characters has been a source of
>>   confusion for characters that are not part of US-ASCII. To describe
>>   the relationship, it is useful to distinguish between a "character"
>>   (as a distinguishable semantic entity) and an "octet" (an 8-bit
>>   byte). There are two mappings, one from URI characters to 
>>octets, and
>>   a second from octets to original characters:
>>
>>   URI character sequence->octet sequence->original character sequence
>>
>>   A URI is represented as a sequence of characters, not as a sequence
>>   of octets. That is because URI might be "transported" by means that
>>   are not through a computer network, e.g., printed on 
>>paper, read over
>>   the radio, etc.
>>
>>   A URI scheme may define a mapping from URI characters to octets;
>>   whether this is done depends on the scheme. Commonly, within a
>>   delimited component of a URI, a sequence of characters may 
>>be used to
>>   represent a sequence of octets. For example, the character "a"
>>   represents the octet 97 (decimal), while the character 
>>sequence "%",
>>   "0", "a" represents the octet 10 (decimal).
>>
>>   There is a second translation for some resources: the sequence of
>>   octets defined by a component of the URI is subsequently used to
>>   represent a sequence of characters. A 'charset' defines 
>>this mapping.
>>   There are many charsets in use in Internet protocols. For example,
>>   UTF-8 [UTF-8] defines a mapping from sequences of octets 
>>to sequences
>>   of characters in the repertoire of ISO 10646.
>>
>>   In the simplest case, the original character sequence contains only
>>   characters that are defined in US-ASCII, and the two levels of
>>   mapping are simple and easily invertible: each 'original character'
>>   is represented as the octet for the US-ASCII code for it, which is,
>>   in turn, represented as either the US-ASCII character, or else the
>>   "%" escape sequence for that octet.
>>
>>   For original character sequences that contain non-ASCII characters,
>>   however, the situation is more difficult. Internet protocols that
>>   transmit octet sequences intended to represent character sequences
>>   are expected to provide some way of identifying the 
>>charset used, if
>>   there might be more than one [RFC2277].  However, there is 
>>currently
>>   no provision within the generic URI syntax to accomplish this
>>   identification. An individual URI scheme may require a single
>>   charset, define a default charset, or provide a way to indicate the
>>   charset used.
>>
>>   It is expected that a systematic treatment of character encoding
>>   within URI will be developed as a future modification of this
>>   specification.
>>
>>
>>
>>----------------------------------------------------------------
>>To subscribe or unsubscribe from this elist use the subscription
>>manager: <http://lists.oasis-open.org/ob/adm.pl>
>>