[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: Third try on HXRI Encoding/Decoding Rules
If you haven’t reviewed it yet, ignore the previous two messages and review the following. It separates out the preliminary work of putting the QXRI into URI-normal form, as this is the form the QXRI should be in AFTER full decoding. Also, the examples have been updated to reflect this.
To conform with the typical requirements of web server URI parsing libraries, HXRIs MUST be encoded prior to input and decoded prior to output. Because web server libraries typically perform some of these decoding functions automatically, proxy resolver implementers MUST ensure that their implementation, when used in conjunction with a specific web server, accomplishes the full set of decoding steps specified in this section. In addition, these decoding steps MUST be performed prior any comparison operation defined in this specification.
Before any HXRI-specific encoding/decoding steps performed, the QXRI portion of the HXRI (including all HXRI query parameters) MUST be transformed into URI-normal form as defined in section 2.3 of [XRISyntax]. This means characters not allowed in URIs, such as SPACE, or characters that are valid only in IRIs, such as UCS characters outside the valid URI set, MUST be percent encoded. Also, the plus sign character (“+”) MUST NOT be used to encode the SPACE character because in decoding the percent-encoded sequence %2B MUST be interpreted as the plus sign character (“+”).
The result of this transformation is the baseline HXRI. Once the baseline HXRI is created, the following sequence of encoding steps MUST be performed in the order specified:
1. First, in order to preserve percent-encoding when the HXRI is passed through a web server, all percent signs MUST be themselves percent-encoded, i.e., a SPACE encoded as %20 would become %2520. This step is not idempotent, so it MUST be performed only once.
2. Second, any occurrences of the ampersand character (“&”) within an HXRI query parameter that are NOT used to delimit it from another query parameter MUST be percent encoded using the sequence %26. This prevents misinterpretation of HXRI query parameters by a proxy resolver.
3. Third, any semicolon used to delimit one of the media type parameters defined in Table 6 from the media type value MUST be percent-encoded using the sequence %3B. This prevents misinterpretation of the semicolon character by a Web server.
To decode an encoded HXRI back into the baseline HXRI, the above steps MUST be performed in inverse order. Again, note that step 1 above is not idempotent, so it MUST be performed only once during decoding.
Following is an example baseline HXRI before application of these encoding rules, with the characters requiring encoding highlighted in red. Note that the string hello%20plan%E9te was originally hello planéte. The SPACE and é characters were percent encoded to put the QXRI into XRI-normal form.
Here is the fully encoded HXRI, with the encoding highlighted in red.