mqtt message

Subject: [OASIS Issue Tracker] (MQTT-260) Add a CONNACK code of 'Try Another Server'
From: OASIS Issues Tracker <workgroup_mailer@lists.oasis-open.org>
To: mqtt@lists.oasis-open.org
Date: Wed, 6 Jul 2016 09:22:49 +0000 (UTC)
    [ https://issues.oasis-open.org/browse/MQTT-260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=62858#comment-62858 ] 

Raphael Cohn commented on MQTT-260:
-----------------------------------

Yes, you're right. The client has no idea where the server-side state is. Nor should it. That is the responsibility of the server-side, and it's one of the things that makes MQTT so simple to use. We push the complexity, such as it is needed, into the server-side. And the beauty of this proposal is that the server side can choose any approach it likes. We're also mixing up the concepts of server and broker here; logically, a broker is a set of one or more servers, so 'try another server' should only be used.

I'm not sure how you can argue it is more complicated for a client to have a list of server addresses to try? I've implemented those sorts of clients many times. DNS SRV is simply one mechanism that provides such a list; there are many alternatives. Such as a line-based conf file. Personally, a URI parser is a darn sight more work, more difficult to get right, and, for a small embedded box, a memory hog, than a simple list of servers... If one really wants to use a URI instead, then it can be provided out-of-band, too, which I think Ed suggests. Provide it using a retained message. Connect to a REST endpoint and retrieve it. (The latter might work better for the Websocketeers). Stick it in a DNS TXT record. Put it in a DHCP option and PXE boot. All these things could be the best choice, depending on the chosen environment.

Perhaps it would help if we split this return code into two:-
- Busy => Try Another Server For Now (use cached round-robin A records, SRV records, LOC records, configuration file, last known list of servers from a retained message, talk to etcd, ask systemd-meglomania-bruhahaa-confd over dbus and so on); if you only know about one, sleep for a variable amount of time and re-try. Can also mean under maintenance.
- Never Call Me Again => You should re-read your original configuration (re-start, re-init, try to find a firmware update, query DNS again even if cached data is not expired, etc); if you can't, or won't, then this is a hard fail. Stop.

(These are somewhat similar to HTTP's temporary and permanent redirects, although I'd be cautious about making the analogy too strongly).

If we do this, then we can support my 'sell 50% of the fleet' scenario (which, by the way, really happened to me a few years ago with an Electric Truck supplier).

Making the URI optional helps no-one; if it is given, and a client doesn't support it, then what? Optional parts of the protocol aren't a good choice.

Lastly, sending back large lumps of text in CONNACK, especially when the CONNACK in this case is likely due to load, strikes me as a great way to make scaling problems worse.

So, in summary, this proposal does not solve all the potential issues for certain choices of clustered broker implementation. However, it makes it straightforward to implement a very large number of different client and server scenarios.

> Add a CONNACK code of 'Try Another Server'
> ------------------------------------------
>
>                 Key: MQTT-260
>                 URL: https://issues.oasis-open.org/browse/MQTT-260
>             Project: OASIS Message Queuing Telemetry Transport (MQTT) TC
>          Issue Type: Improvement
>          Components: futures
>    Affects Versions: 5
>            Reporter: Raphael Cohn
>            Assignee: Raphael Cohn
>            Priority: Critical
>
> If we add a CONNACK return code of 'Try Another Server', this makes it easier for over-loaded servers to tell clients to redirect. This works in conjunction with MQTT-259, which advocates the use of DNS SRV records.
> Indeed, if we also added server-originated DISCONNECT packets with this return code, we could get clients to cleanly migrate to another server when a server is shutdown for maintenance.
> Please note, I do not favour the server also reporting which new server to connect to. There in lies the route to madness, as it means the current server has to know the state of all the others. That's intimate knowledge.



--
This message was sent by Atlassian JIRA
(v6.2.2#6258)