mqtt message

Subject: [OASIS Issue Tracker] Commented: (MQTT-32) Editorial comments on 2.3 (UTF-8 encoded strings)
From: OASIS Issues Tracker <workgroup_mailer@lists.oasis-open.org>
To: mqtt@lists.oasis-open.org
Date: Tue, 2 Jul 2013 20:55:43 -0400 (EDT)
    [ http://tools.oasis-open.org/issues/browse/MQTT-32?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=34022#action_34022 ] 

Rahul Gupta commented on MQTT-32:
---------------------------------

line 433 - In MQTT, many of the control packets contain components that are defined as UTF-8 encoded strings. Each of these strings is prefixed with a two byte length field that gives the number of bytes in the UTF-8 encoded string itself, as shown in table below.. Consequently there is a limit on the size of a string that can be passed in one of these UTF-8 encoded string components; you cannot use a string that would encode to more than 65535 bytes. Unless stated otherwise all strings are 0 to 65535 UTF-8 encoded bytes in length.

line 460 - For example, the string A which is LATIN CAPITAL Letter A followed by the code point U+2A6D4 (which represents a CJK IDEOGRAPH EXTENSION B character). Surrogate pairs are specified by [RFC2781] and are visible to the programmer in some programming languages, for example Java.

line 465 - Fixed bit representation of 0x94


> Editorial comments on 2.3 (UTF-8 encoded strings)
> -------------------------------------------------
>
>                 Key: MQTT-32
>                 URL: http://tools.oasis-open.org/issues/browse/MQTT-32
>             Project: OASIS Message Queuing Telemetry Transport (MQTT) TC
>          Issue Type: Improvement
>          Components: edits
>            Reporter: Peter Niblett
>            Priority: Trivial
>
> 1. You need to make it clear that this section does not apply to the payload of a PUBLISH message, since applications are free to send UTF8 encoded messages up to 256M in the payload.
> You could do this by changing the second paragraph (WD04 line 473) to start "Many of the Control Packets contain components that are defined as UTF-8 encoded strings. Each of these strings is prefixed with a two byte length field that gives the number of bytes in the UTF-8 encoded string itself..."
> 2. WD04 line 474 says "Consequently strings must be encoded in fewer than 65536 bytes". This reads a little strangely as it's not possible to encode very long strings in fewer than 65536 bytes. It would be better to say "Consequently there is a limit on the size of a string that can be passed in one of these UTF-8 encoded string components; you cannot use a string that would encoded to more than 65535 bytes".
> 3. Line 484 says "Letter A followed by the surrogate pair representing CJK IDEOGRAPH EXTENSION B U+2A6D4" and then goes on to talk about surrogate pairs are.  However I don't think it is necessary to talk about surrogate pairs here, since you are actually showing the code point as a regular Unicode code point, not as a UTF-16 surrogate pair. I would replace this with 
> "Letter A followed by the code point U+2A6D4  (which represents a CJK IDEOGRAPH EXTENSION B character) "
> and remove the sentence about surrogate pairs.
> 4. The encoding for byte 7 shown in the table is incorrect. 0x94 in binary is 10010100 

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://tools.oasis-open.org/issues/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira