[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: [OASIS Issue Tracker] (BDXR-22) Case sensitivity of string "UTF-8"
[ https://issues.oasis-open.org/browse/BDXR-22?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=68543#comment-68543 ] Pim van der Eijk commented on BDXR-22: -------------------------------------- Many thanks Kenneth for a quick response. I found the comment resolution at: https://www.oasis-open.org/committees/document.php?document_id=58258 And there it says: "Case sensitivity requirement has been deleted from the text." I also found that the earlier CSPRD02 had a sentence that this fix removed: "Please observe that the content of the encoding attribute is case sensitive." Maybe for a future version you can explicitly state that the encoding attribute value is case INsensitive, e.g. like: "Please observe that the content of the encoding attribute is not case sensitive." Because the current wording suggests a string comparison to a quoted upper case value (... set to “UTF-8” ...) At least one current implementer interpreted this to mean upper case only. > Case sensitivity of string "UTF-8" > ----------------------------------- > > Key: BDXR-22 > URL: https://issues.oasis-open.org/browse/BDXR-22 > Project: OASIS Business Document Exchange (BDXR) TC > Issue Type: Bug > Components: Documentation > Affects Versions: SMP 1.0 > Reporter: Pim van der Eijk > Priority: Minor > > Section 3.3 of SMP states: > XML documents returned by HTTP GET MUST be well-formed according to [XML 1.0] and MUST be UTF-8 encoded ([Unicode]). They MUST contain an XML declaration starting with “<?xml” which includes the «encoding» attribute set to “UTF-8”. > This can be interpreted as implying that using the lower case string "utf-8" for the encoding would be incorrect. There are a number of problems with this: > 1) All examples in the spec use "utf-8". While it is true that the examples are marked as non-normative, one would expect them to be consistent with the spec. > 2) XML 1.0 states that XML processors SHOULD match character encoding names in a case-insensitive way. > 3) the IANA character set repository states that "character set names may be up to 40 characters taken from the printable characters of US-ASCII. However, no distinction is made between use of upper and lower case letters." > https://www.iana.org/assignments/character-sets/character-sets.xhtml > 4) If no encoding is specified, XML 1.0 assumes UTF-8 encoding. The attribute is only relevant is some other encoding (like UTF-16) would be used. > 5) XML has been around for two decades. I doubt that any of the current versions of commonly used XML libraries would break if the non-all-uppercase variant is used. > Internet conventional wisdom suggests that the uppercase variant is preferred, because XML 1.0 uses SHOULD instead of MUST, but that both are allowed. -- This message was sent by Atlassian JIRA (v6.2.2#6258)
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]