bdxr message

Subject: [OASIS Issue Tracker] (BDXR-22) Case sensitivity of string "UTF-8"
From: OASIS Issues Tracker <workgroup_mailer@lists.oasis-open.org>
To: bdxr@lists.oasis-open.org
Date: Mon, 18 Dec 2017 13:30:26 +0000 (UTC)
    [ https://issues.oasis-open.org/browse/BDXR-22?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=68542#comment-68542 ] 

Kenneth Bengtsson commented on BDXR-22:
---------------------------------------

The current editing was decided upon in April 2016 after having received the same comment for SMP CSPRD 02 (you can find the approved comment resolution log in the TC archive as well as several meeting minutes from when the subject was discussed).

The users who brought up the question found that the current text "XML documents returned by HTTP GET MUST be well-formed according to [XML 1.0] and MUST be UTF-8 encoded ([Unicode])" was sufficiently unambiguous to not be in doubt that case sensitivity requirements follow the HTTP specification and are not proprietary to SMP.

> Case sensitivity of string "UTF-8" 
> -----------------------------------
>
>                 Key: BDXR-22
>                 URL: https://issues.oasis-open.org/browse/BDXR-22
>             Project: OASIS Business Document Exchange (BDXR) TC
>          Issue Type: Bug
>          Components: Documentation
>    Affects Versions: SMP 1.0
>            Reporter: Pim van der Eijk
>            Priority: Minor
>
> Section 3.3 of SMP states:
> XML documents returned by HTTP GET MUST be well-formed according to [XML 1.0] and MUST be UTF-8 encoded ([Unicode]). They MUST contain an XML declaration starting with “<?xml” which includes the «encoding» attribute set to “UTF-8”.
> This can be interpreted as implying that using the lower case string "utf-8" for the encoding would be incorrect.  There are a number of problems with this:
> 1)  All examples in the spec use "utf-8".  While it is true that the examples are marked as non-normative,  one would expect them to be consistent with the spec.
> 2) XML 1.0 states that XML processors SHOULD match character encoding names in a case-insensitive way.  
> 3) the IANA character set repository states that "character set names may be up to 40 characters taken from the printable characters of US-ASCII.  However, no distinction is made between use of upper and lower case letters."
> https://www.iana.org/assignments/character-sets/character-sets.xhtml 
> 4) If no encoding is specified,  XML 1.0 assumes UTF-8 encoding. The attribute is only relevant is some other encoding (like UTF-16) would be used.
> 5)  XML has been around for two decades.  I doubt that any of the current versions of commonly used XML libraries would break if the non-all-uppercase variant is used.
> Internet conventional wisdom suggests that the uppercase variant is preferred, because XML 1.0 uses SHOULD instead of MUST, but that both are allowed.  



--
This message was sent by Atlassian JIRA
(v6.2.2#6258)