virtio-comment message

Subject: Re: [virtio-comment] Re: [PATCH v1] docs/vhost-user: extend the vhost-user protocol to support the vhost-pci based inter-vm communication

From: Wei Wang <wei.w.wang@intel.com>
To: Marc-André Lureau <marcandre.lureau@gmail.com>
Date: Wed, 09 Nov 2016 16:32:16 +0800

On 11/08/2016 08:17 PM, Marc-André Lureau wrote:

>

    >      Message Specification
    >      ---------------------
    >
    >      Note that all numbers are in the machine native byte order. A
    >     vhost-user message
    >     -consists of 3 header fields and a payload:
    >     +consists of 4 header fields and a payload:
    >
    >     -------------------------------------
    >     -| request | flags | size | payload |
    >     -------------------------------------
    >     +----------------------------------------------
    >     +| request | flags | conn_id | size | payload |
    >     +----------------------------------------------
    >
    >       * Request: 32-bit type of the request
    >       * Flags: 32-bit bit field:
    >         - Lower 2 bits are the version (currently 0x01)
    >     -   - Bit 2 is the reply flag - needs to be sent on each reply
    >     from the slave
    >     +   - Bit 2 is the reply flag - needs to be sent on each reply
    >         - Bit 3 is the need_reply flag - see
    >     VHOST_USER_PROTOCOL_F_REPLY_ACK for
    >           details.
    >     + * Conn_id: 64-bit connection id to indentify a client socket
    >     connection. It is
    >     +            introduced in version 0x02 to support the
    >     "1-server-N-client" model
    >     +            and an asynchronous client read implementation. The
    >     connection id,
    >     +            0xFFFFFFFFFFFFFFFF, is used by an anonymous client
    >     (e.g. a client who
    >     +            has not got its connection id from the server
    in the
    >     initial talk)
    >
    >
    > I don't understand why you need a connection id, on each message.
    > What's the purpose? Since the communication is unicast, a single
    > message should be enough.

    Sure, please let me explain more:
    The QEMU socket is going to be upgraded to support 1 server socket
    being
    connected by multiple client sockets (I've made patches to achieve
    this). In other words, here, multiple masters will connect to one
    slave,
    and the slave creates a vhost-pci device for each master after
    receiving
    the necessary message info. The slave needs to know which master it is
    talking to when receiving a message, as it maintains multiple
    connections at the same time.

You should be able to identify each connection in the slave (as asocket server), without a need for connection id: connected socketsare independent from each others.

Yes, that's doable. But why couldn't we do it from the protocol layer? Ithink it will be easier.

Please check below my thoughts about the implementation if we do it inthe slave:

The interface for receiving a msg is - tcp_chr_read(QIOChannel *chan,GIOCondition cond, void *opaque)

QIOChannel is the one that we can use to identify the master connectionwho sends this msg (the socket server now has an array of QIOChannel,ioc[MAX_CLIENTS]). Everytime a msg is received, the tcp_chr_read() needsto compare *chan and the ioc[] array, to find out the id (indexed intothe ioc[]), and passes the id to qemu_chr_be_write(), and all the waydown to the final slave handler where the msg is parsed and handled.This needs modifications to the existing APIs, for example, thementioned qemu_chr_be_write() will need one more parameter, "id". Thiswill not be compatible with the existing implementation, because allother implementations which invoke qemu_chr_be_write() will need to bepatched to use the new qemu_chr_be_write(,"id",).

    >       * Size - 32-bit size of the payload
    >
    >
    >     @@ -97,6 +106,13 @@ Depending on the request type, payload
    can be:
    >         log offset: offset from start of supplied file descriptor
    >             where logging starts (i.e. where guest address 0
    would be
    >     logged)
    >
    >     +* Device info
    >     +   --------------------
    >     +   | virito id | uuid |
    >     +   --------------------
    >     +   Virtio id: 16-bit virtio id of the device
    >     +   UUID: 128-bit UUID to identify the QEMU instance that
    creates
    >     the device
    >     +
    >
    >
    > I wonder if UUID should be a different message.
    >
    We can make uuid another message if it has other usages.
    Do you see any other usages of uuid?

Allows to associate data/configuration with a particular VM, in amulti-master/single-slave scenario. But tbh, I don't see how this isnecessary, I can imagine solving this differently (having differentconnection address per vm for ex).

Using connection addresses, how could you know if the two connectionsare from the same VM?

I would like to understand your use case.



Here is an example of the use case:

VM1 has two master connections (connection X and Y) and VM2 has 1 masterconnection (Z).X,Y,Z - each has a connection id. But X and Y send the same uuid, uuid1,to the slave, and Z sends uuid2 to the slave. In this way, the slaveknow X and Y are the two connections from the same VM, and Z is aconnection from a different VM.

For connection Y, the vhost-pci device will be created in a way whichdoes not need the driver to map the memory, since it has already beenmapped by device X from the same VM.



    >      [ Also see the section on REPLY_ACK protocol extension. ]
    >
    >     +Currently, the communication also supports the Slave (server)
    >     sending messages
    >     +to the Master (client). Here is a list of them:
    >     + * VHOST_USER_SET_FEATURES
    >
    >     + * VHOST_USER_SET_PEER_CONNECTION (the serve may actively
    request
    >     to disconnect
    >     +   with the client)
    >
    >
    > Oh, you are making the communication bidirectional? This is a
    > fundamental change in the protocol. This may be difficult to
    implement
    > in qemu, since the communication in synchronous, a request
    expects an
    > immediate reply, if it gets back a request (from the slave) in the
    > middle, it will fail.
    >

    Not really.
    Adding the above two doesn't affect the existing synchronous read()
    messages (basically, those VHOST_USER_GET_xx messages). Like
    VHOST_USER_SET_FEATURES, the _SET_ messages don't need a reply.
    Here, we
    just make the slave capable of actively sending messages to the
    master.

Yes, that's the trouble. At any time the Master may send a request andexpects an immediate reply. There is a race of getting a request fromthe Slave in the middle with your proposed change. I'd rather avoidmaking the request bidirectionnal if possible. (I proposed a secondchannel for Slave->Master request in the past:https://lists.gnu.org/archive/html/qemu-devel/2016-04/msg00095.html)

If the message that the slave got has a different "request" fieldvalue, it simply drops it and re-read again. The implementation is notcomplex also, please see the change example to vhost_user_get_u64() below:


   if (vhost_user_write(dev, &msg_w, NULL, 0) < 0) {
       return -1;
    }
retry:
    if (vhost_user_read(dev, &msg_r) < 0) {
        return -1;
   }
    if (msg_r.request != msg_w.request)
        goto retry;

On the other side, the slave's request to the master is dropped due tothe race. This race can be solved in the protocol layer - let the _SET_request ask for an ACK, if no ACK is received, re-sent it. Also, thiskind of race should be very rare in real usage.


    >
    > +This request should be sent only when
    VHOST_USER_PROTOCOL_F_VHOST_PCI
    > has...
    >
    >     +* VHOST_USER_SET_DEV_INFO
    >     +
    >     +      Id: 21
    >     +      Equivalent ioctl: N/A
    >     +      Master payload: dev info
    >     +
    >     +      The client sends the producer device info to the server.
    >
    >
    > "Master sends producer device info to the Slave" works, no?

    Yes, it works. The current dev info only contains a "virtio id" field
    (assume we'll take uuid out as a separate message), which tells the
    slave if it is a net, scsi, console or else. do you see any issue?
    >
    > Could we guarantee this message is sent before SET_VRING*?

    Why do we need to guarantee this?

It would simplify the protocol to have expectations on when messagescome. In particular, an early message with devinfo would allow tocheck/pre-configure the Slave for a particular device. AlsoVHOST_USER_SET_DEV_INFO should probably be unique (don't allow adevice to be reconfigured)

Yes, it is sent in an early age of the vhost-user protocol interaction.It's implemented to be sent right after sending theVHOST_USER_SET_PROTOCOL_FEATURES msg. On the slave side, when itreceives SET_DEV_INFO, it pre-configures the device in a table entry (asmentioned before, a device will be created from the table entry at alater stage of the protocol interaction).

I think it should be the implementation logic, likeVHOST_USER_SET_PROTOCOL_FEATURES. why do we need to add a guarantee inthe protocol to specify the order?



    >
    >     + This request should be sent only when
    >     VHOST_USER_PROTOCOL_F_VHOST_PCI has
    >     +      been negotiated.
    >     +
    >
    >
    > I think this message could be useful for other purposes than
    > vhost-pci, thus I would give it its own flag.

    Could you please give an example of other usage? Thanks.

You could have a Slave that implements various devices, and pick thecorresponding one dynamically (we already have implementations fornet/input/gpu/scsi...)

If I understand the example correctly, the various devices still belongsto the vhost-pci series - in the future we would have vhost-pci-net,vhost-pci-scsi, vhost-pci-gpu etc. If that's the case, we may still usethe VHOST_PCI flag.


    >
    >     +* VHOST_USER_SET_PEER_CONNECTION
    >     +
    >     +      Id: 22
    >     +      Equivalent ioctl: N/A
    >     +      Master payload: u64
    >     +
    >     +      The producer device requests to connect or disconnect to
    >     the consumer device.
    >
    >
    > producer->Master, consummer->Slave
    >
    > How does it interact with SET_VRING_ENABLE?

    It's independent of SET_VRING_ENABLE:
    SET_VRING_ENABLE enables a virtq to be in "active".
    SET_PEER_CONNECTION enables the peer (slave or master) device to be in
    "active". The driver shouldn't send packets if the device is inactive.

I fail to see the difference with SET_VRING_ENABLE, perhaps someonemore familiar with the protocol could help here.

I'm not sure if another email explaning this was sent out successfully,repost the explanation here:

The SET_PEER_CONNECTION msg is ued to turn "ON/OFF" the (slave ormaster) device connection status. For example, when the master side VMwants to turn down, the virtio-net driver sets the virtio-net device'sPEER_CONNECTION status to "OFF" - before this happens, the virtio-netdevice needs to sync-up with the vhost-pci-net device first, that is,sending a VHOST_USER_SET_PEER_CONNECTION(cmd=OFF) msg to the master. Inreturn (not as a synchronous reply, because it has to sync with thedriver to stop using the slave side resource first), the vhost-pci-netdevice sends VHOST_USER_SET_PEER_CONNECTION(cmd=OFF) msg to the slave -this sets the virtio-net device's PEER_CONNECTION status to "OFF" andthen the virtio driver is ready to unload. (same for the vhost-pci-netdriver to unload)

SET_VRING_ENABLE controls the virtq status - the slave should not usethe virtq if it's not enabled by the master. For example, a device mayhave 4 vitrqs, if vq[0].enabled==0, then the slave should not use vitrq 0.

    >
    >     + The consumer device may request to disconnect to the producer
    >     device. This
    >     +      request should be sent only when
    >     VHOST_USER_PROTOCOL_F_VHOST_PCI has been
    >     +      negotiated.
    >     +      Connection request: If the reply message indicates
    >     "success", the vhost-pci based
    >     +      inter-VM communication channel has been established.
    >     +      Disconnection request: If the reply message indicates
    >     "success", the vhost-pci based
    >     +      inter-VM communication channel has been destroyed.
    >     +      #define VHOST_USER_SET_PEER_CONNECTION_F_OFF 0
    >     +      #define VHOST_USER_SET_PEER_CONNECTION_F_ON 1
    >     +
    >
    I think it would be better to add one more command here:
    #define VHOST_USER_SET_PEER_CONNECTION_F_INIT 2

    The master uses this command to tell the slave it's ready to
    create the
    vhost-pci device. Regarding the implementation, it is put at the
    bottom
    of vhost_net_start() function (when all the vring info have been sent
    and enabled).

Do you have WIP branch for qemu vhost-pci? That could help tounderstand the context.



Yes, I can share them.


Best,
Wei

Follow-Ups:
- Re: [virtio-comment] Re: [PATCH v1] docs/vhost-user: extend the vhost-user protocol to support the vhost-pci based inter-vm communication
  - From: Marc-André Lureau <marcandre.lureau@gmail.com>

References:
- Re: [PATCH v1] docs/vhost-user: extend the vhost-user protocol to support the vhost-pci based inter-vm communication
  - From: Marc-André Lureau <marcandre.lureau@gmail.com>
- Re: [virtio-comment] Re: [PATCH v1] docs/vhost-user: extend the vhost-user protocol to support the vhost-pci based inter-vm communication
  - From: Wei Wang <wei.w.wang@intel.com>
- Re: [virtio-comment] Re: [PATCH v1] docs/vhost-user: extend the vhost-user protocol to support the vhost-pci based inter-vm communication
  - From: Marc-André Lureau <marcandre.lureau@gmail.com>