OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

bdxr message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]

Subject: [OASIS Issue Tracker] (BDXR-22) Case sensitivity of string "UTF-8"

    [ https://issues.oasis-open.org/browse/BDXR-22?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=68543#comment-68543 ] 

Pim van der Eijk commented on BDXR-22:

Many thanks Kenneth for a quick response. I found the comment resolution at:

And there it says:

"Case sensitivity requirement has been deleted from the text."

I also found that the earlier CSPRD02 had a sentence that this fix removed: 

"Please observe that the content of the encoding attribute is case sensitive."

Maybe for a future version you can explicitly state that the encoding attribute value is case INsensitive, e.g. like:  

"Please observe that the content of the encoding attribute is not case sensitive."

Because the current wording suggests a string comparison to a quoted upper case value (... set to “UTF-8” ...)   At least one current implementer interpreted this to mean upper case only.

> Case sensitivity of string "UTF-8" 
> -----------------------------------
>                 Key: BDXR-22
>                 URL: https://issues.oasis-open.org/browse/BDXR-22
>             Project: OASIS Business Document Exchange (BDXR) TC
>          Issue Type: Bug
>          Components: Documentation
>    Affects Versions: SMP 1.0
>            Reporter: Pim van der Eijk
>            Priority: Minor
> Section 3.3 of SMP states:
> XML documents returned by HTTP GET MUST be well-formed according to [XML 1.0] and MUST be UTF-8 encoded ([Unicode]). They MUST contain an XML declaration starting with “<?xml” which includes the «encoding» attribute set to “UTF-8”.
> This can be interpreted as implying that using the lower case string "utf-8" for the encoding would be incorrect.  There are a number of problems with this:
> 1)  All examples in the spec use "utf-8".  While it is true that the examples are marked as non-normative,  one would expect them to be consistent with the spec.
> 2) XML 1.0 states that XML processors SHOULD match character encoding names in a case-insensitive way.  
> 3) the IANA character set repository states that "character set names may be up to 40 characters taken from the printable characters of US-ASCII.  However, no distinction is made between use of upper and lower case letters."
> https://www.iana.org/assignments/character-sets/character-sets.xhtml 
> 4) If no encoding is specified,  XML 1.0 assumes UTF-8 encoding. The attribute is only relevant is some other encoding (like UTF-16) would be used.
> 5)  XML has been around for two decades.  I doubt that any of the current versions of commonly used XML libraries would break if the non-all-uppercase variant is used.
> Internet conventional wisdom suggests that the uppercase variant is preferred, because XML 1.0 uses SHOULD instead of MUST, but that both are allowed.  

This message was sent by Atlassian JIRA

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]