[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: [OASIS Issue Tracker] Commented: (MQTT-44) Specific details for UTF-8 Strings
[ http://tools.oasis-open.org/issues/browse/MQTT-44?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=35008#action_35008 ] Raphael Cohen commented on MQTT-44: ----------------------------------- Peter, In mostly violent agreement. Summarizing from before, I'd like some normative things:- - UTF-8 Reference (MUST) - Unicode 5.1 or 6.0 defining the character (addresses defined characters) (MUST) - SHOULD not permit undefined characters as per Unicode version X (re: your comment no 2, I've written in the past various UTF-validating scanners and I'd be happy to donate to Paho/write one for the common good, but making it SHOULD) - Unicode versions do change but we track that with an annual or longer process; right now, Unicode 5/6 is such that I can't see it'll ever matter unless someone wants alternative Celtic runes or insanely obscure pre-modern era Chinese ideograms. - MUST not normalize as per RFC 5198 Semi-normative (ie I care enough that these things should be stated even though they duplicate the above; I've seen implementors in other standards, etc, get this badly wrong) - - MAY not accept characters in C0 / C1 or ASCII where might cause a denial of service attack - MAY not accept certain topic names consisting of such characters even if defined - Non-normatively point out impact of ASCII NUL, DEL (127), use of . , .. , and / in POSIX, as well as your CR, LF, etc. Also worth considering Windows FAT12/16/32/NTFS limitations - I'm actually open to some of these characters being a SHOULD, because I really care about propagation / magnification attacks in networks of brokers Ideas - Good advice might be to not use anything in the ASCII range for topics apart from A-Z, a-z, 0-9 and /, and to escape each 'path segment' using the latest RFC on URI escaping ? (which is also a pain, because Java doesn't support that, either) > Specific details for UTF-8 Strings > ---------------------------------- > > Key: MQTT-44 > URL: http://tools.oasis-open.org/issues/browse/MQTT-44 > Project: OASIS Message Queuing Telemetry Transport (MQTT) TC > Issue Type: Improvement > Components: core > Affects Versions: 3.1.1 > Reporter: Rahul Gupta > > This issues is based on comments in MQTT-24, and is opened a Core issue to discuss in MQTT TC Call, I had a discussion with my co-editor Andy and he suggested to open a core issue for TC discussion. > from MQTT-24 > ------------------- > > We should also make a simple statement that UTF-8 encodings MUST NOT have a three character initial BOM. > > A clarification that the encoding MUST NOT be Java's Modified UTF-8, and can contain ASCII NULL > > At the same time, it's probably worth nothing too that certain unicode combinations are invalid in UTF- 8 - the use of surrogate pairs from UTF-16 re-encoded and certain non-transmissable characters (eg U+FFFE from memory) - these normally delimit the last 2 characters in a multi-lingual plain. These restrictions are only a minor burden fro java implementations using the naive methods in string / character. These restrictions serve to stop propagation of bad data through a network of nodes. > > Implementations MAY decide to not support the use of ASCII NUL and C0 / C1 control codes / MAY decide to place additional restrictions on supported characters -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://tools.oasis-open.org/issues/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]