mqtt message

Subject: [OASIS Issue Tracker] Commented: (MQTT-44) Specific details for UTF-8 Strings
From: OASIS Issues Tracker <workgroup_mailer@lists.oasis-open.org>
To: mqtt@lists.oasis-open.org
Date: Tue, 1 Oct 2013 11:12:39 +0000 (UTC)
    [ http://tools.oasis-open.org/issues/browse/MQTT-44?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=35008#action_35008 ] 

Raphael Cohen commented on MQTT-44:
-----------------------------------

Peter,

In mostly violent agreement.

Summarizing from before, I'd like some normative things:-
- UTF-8 Reference (MUST)
- Unicode 5.1 or 6.0 defining the character (addresses defined characters) (MUST)
- SHOULD not permit undefined characters as per Unicode version X (re: your comment no 2, I've written in the past various UTF-validating scanners and I'd be happy to donate to Paho/write one for the common good, but making it SHOULD)
   - Unicode versions do change but we track that with an annual or longer process; right now, Unicode 5/6 is such that I can't see it'll ever matter unless someone wants alternative Celtic runes or insanely obscure pre-modern era Chinese ideograms.
- MUST not normalize as per RFC 5198

Semi-normative (ie I care enough that these things should be stated even though they duplicate the above; I've seen implementors in other standards, etc, get this badly wrong)
- 
- MAY not accept characters in C0 / C1 or ASCII where might cause a denial of service attack
- MAY not accept certain topic names consisting of such characters even if defined
   - Non-normatively point out impact of ASCII NUL, DEL (127), use of . , .. , and / in POSIX, as well as your CR, LF, etc. Also worth considering Windows FAT12/16/32/NTFS limitations
   - I'm actually open to some of these characters being a SHOULD, because I really care about propagation / magnification attacks in networks of brokers

Ideas
   - Good advice might be to not use anything in the ASCII range for topics apart from A-Z, a-z, 0-9 and /, and to escape each 'path segment' using the latest RFC on URI escaping ? (which is also a pain, because Java doesn't support that, either)


> Specific details for UTF-8 Strings
> ----------------------------------
>
>                 Key: MQTT-44
>                 URL: http://tools.oasis-open.org/issues/browse/MQTT-44
>             Project: OASIS Message Queuing Telemetry Transport (MQTT) TC
>          Issue Type: Improvement
>          Components: core
>    Affects Versions: 3.1.1
>            Reporter: Rahul Gupta
>
> This issues is based on comments in MQTT-24, and is opened  a Core issue to discuss in MQTT TC Call, I had a discussion with my co-editor Andy and he suggested to open a core issue for TC discussion.
> from MQTT-24
> -------------------
> > We should also make a simple statement that UTF-8 encodings MUST NOT have a three character initial BOM.
> > A clarification that the encoding MUST NOT be Java's Modified UTF-8, and can contain ASCII NULL
> > At the same time, it's probably worth nothing too that certain unicode combinations are invalid in UTF- 8 - the use of surrogate pairs from UTF-16 re-encoded and certain non-transmissable characters (eg U+FFFE from memory) - these normally delimit the last 2 characters in a multi-lingual plain. These restrictions are only a minor burden fro java implementations using the naive methods in string / character. These restrictions serve to stop propagation of bad data through a network of nodes.
> > Implementations MAY decide to not support the use of ASCII NUL and C0 / C1 control codes / MAY decide to place additional restrictions on supported characters

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://tools.oasis-open.org/issues/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira