mqtt message

Subject: [OASIS Issue Tracker] Commented: (MQTT-44) Specific details for UTF-8 Strings
From: OASIS Issues Tracker <workgroup_mailer@lists.oasis-open.org>
To: mqtt@lists.oasis-open.org
Date: Tue, 1 Oct 2013 08:36:39 +0000 (UTC)
    [ http://tools.oasis-open.org/issues/browse/MQTT-44?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=35005#action_35005 ] 

Peter Niblett commented on MQTT-44:
-----------------------------------

Several things here

1. We need to clarify what we mean by UTF-8. It's what it says in the RFC, so no BOM, no CESU-8 encoding of characters from astral planes, no Java-like "modified UTF-8"

2. We need to decide what to do about the use of undefined Unicode characters.I would say they shouldn't be allowed in usernames or passwords, but what about Topic Names? It's likely to be more trouble than it is worth for implementations to be required to check for them

3. We need to add a clarification that normalization (as described in RFC 5198) is NOT to be performed before doing topic matching.

4. We might want to give guidance about the use of special characters, like CR and LF in topic strings



> Specific details for UTF-8 Strings
> ----------------------------------
>
>                 Key: MQTT-44
>                 URL: http://tools.oasis-open.org/issues/browse/MQTT-44
>             Project: OASIS Message Queuing Telemetry Transport (MQTT) TC
>          Issue Type: Improvement
>          Components: core
>    Affects Versions: 3.1.1
>            Reporter: Rahul Gupta
>
> This issues is based on comments in MQTT-24, and is opened  a Core issue to discuss in MQTT TC Call, I had a discussion with my co-editor Andy and he suggested to open a core issue for TC discussion.
> from MQTT-24
> -------------------
> > We should also make a simple statement that UTF-8 encodings MUST NOT have a three character initial BOM.
> > A clarification that the encoding MUST NOT be Java's Modified UTF-8, and can contain ASCII NULL
> > At the same time, it's probably worth nothing too that certain unicode combinations are invalid in UTF- 8 - the use of surrogate pairs from UTF-16 re-encoded and certain non-transmissable characters (eg U+FFFE from memory) - these normally delimit the last 2 characters in a multi-lingual plain. These restrictions are only a minor burden fro java implementations using the naive methods in string / character. These restrictions serve to stop propagation of bad data through a network of nodes.
> > Implementations MAY decide to not support the use of ASCII NUL and C0 / C1 control codes / MAY decide to place additional restrictions on supported characters

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://tools.oasis-open.org/issues/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira