[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: [OASIS Issue Tracker] Commented: (MQTT-32) Editorial comments on 2.3 (UTF-8 encoded strings)
[ http://tools.oasis-open.org/issues/browse/MQTT-32?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=34022#action_34022 ] Rahul Gupta commented on MQTT-32: --------------------------------- line 433 - In MQTT, many of the control packets contain components that are defined as UTF-8 encoded strings. Each of these strings is prefixed with a two byte length field that gives the number of bytes in the UTF-8 encoded string itself, as shown in table below.. Consequently there is a limit on the size of a string that can be passed in one of these UTF-8 encoded string components; you cannot use a string that would encode to more than 65535 bytes. Unless stated otherwise all strings are 0 to 65535 UTF-8 encoded bytes in length. line 460 - For example, the string A which is LATIN CAPITAL Letter A followed by the code point U+2A6D4 (which represents a CJK IDEOGRAPH EXTENSION B character). Surrogate pairs are specified by [RFC2781] and are visible to the programmer in some programming languages, for example Java. line 465 - Fixed bit representation of 0x94 > Editorial comments on 2.3 (UTF-8 encoded strings) > ------------------------------------------------- > > Key: MQTT-32 > URL: http://tools.oasis-open.org/issues/browse/MQTT-32 > Project: OASIS Message Queuing Telemetry Transport (MQTT) TC > Issue Type: Improvement > Components: edits > Reporter: Peter Niblett > Priority: Trivial > > 1. You need to make it clear that this section does not apply to the payload of a PUBLISH message, since applications are free to send UTF8 encoded messages up to 256M in the payload. > You could do this by changing the second paragraph (WD04 line 473) to start "Many of the Control Packets contain components that are defined as UTF-8 encoded strings. Each of these strings is prefixed with a two byte length field that gives the number of bytes in the UTF-8 encoded string itself..." > 2. WD04 line 474 says "Consequently strings must be encoded in fewer than 65536 bytes". This reads a little strangely as it's not possible to encode very long strings in fewer than 65536 bytes. It would be better to say "Consequently there is a limit on the size of a string that can be passed in one of these UTF-8 encoded string components; you cannot use a string that would encoded to more than 65535 bytes". > 3. Line 484 says "Letter A followed by the surrogate pair representing CJK IDEOGRAPH EXTENSION B U+2A6D4" and then goes on to talk about surrogate pairs are. However I don't think it is necessary to talk about surrogate pairs here, since you are actually showing the code point as a regular Unicode code point, not as a UTF-16 surrogate pair. I would replace this with > "Letter A followed by the code point U+2A6D4 (which represents a CJK IDEOGRAPH EXTENSION B character) " > and remove the sentence about surrogate pairs. > 4. The encoding for byte 7 shown in the table is incorrect. 0x94 in binary is 10010100 -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://tools.oasis-open.org/issues/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]