OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

virtio-comment message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]

Subject: Re: [virtio-comment] Re: [PATCH v1] docs/vhost-user: extend the vhost-user protocol to support the vhost-pci based inter-vm communication

On 11/08/2016 08:17 PM, Marc-André Lureau wrote:

    >      Message Specification
    >      ---------------------
    >      Note that all numbers are in the machine native byte order. A
    >     vhost-user message
    >     -consists of 3 header fields and a payload:
    >     +consists of 4 header fields and a payload:
    >     -------------------------------------
    >     -| request | flags | size | payload |
    >     -------------------------------------
    >     +----------------------------------------------
    >     +| request | flags | conn_id | size | payload |
    >     +----------------------------------------------
    >       * Request: 32-bit type of the request
    >       * Flags: 32-bit bit field:
    >         - Lower 2 bits are the version (currently 0x01)
    >     -   - Bit 2 is the reply flag - needs to be sent on each reply
    >     from the slave
    >     +   - Bit 2 is the reply flag - needs to be sent on each reply
    >         - Bit 3 is the need_reply flag - see
    >           details.
    >     + * Conn_id: 64-bit connection id to indentify a client socket
    >     connection. It is
    >     +            introduced in version 0x02 to support the
    >     "1-server-N-client" model
    >     +            and an asynchronous client read implementation. The
    >     connection id,
    >     +            0xFFFFFFFFFFFFFFFF, is used by an anonymous client
    >     (e.g. a client who
    >     +            has not got its connection id from the server
    in the
    >     initial talk)
    > I don't understand why you need a connection id, on each message.
    > What's the purpose? Since the communication is unicast, a single
    > message should be enough.

    Sure, please let me explain more:
    The QEMU socket is going to be upgraded to support 1 server socket
    connected by multiple client sockets (I've made patches to achieve
    this). In other words, here, multiple masters will connect to one
    and the slave creates a vhost-pci device for each master after
    the necessary message info. The slave needs to know which master it is
    talking to when receiving a message, as it maintains multiple
    connections at the same time.

You should be able to identify each connection in the slave (as a socket server), without a need for connection id: connected sockets are independent from each others.

Yes, that's doable. But why couldn't we do it from the protocol layer? I think it will be easier.

Please check below my thoughts about the implementation if we do it in the slave:

The interface for receiving a msg is - tcp_chr_read(QIOChannel *chan, GIOCondition cond, void *opaque)

QIOChannel is the one that we can use to identify the master connection who sends this msg (the socket server now has an array of QIOChannel, ioc[MAX_CLIENTS]). Everytime a msg is received, the tcp_chr_read() needs to compare *chan and the ioc[] array, to find out the id (indexed into the ioc[]), and passes the id to qemu_chr_be_write(), and all the way down to the final slave handler where the msg is parsed and handled. This needs modifications to the existing APIs, for example, the mentioned qemu_chr_be_write() will need one more parameter, "id". This will not be compatible with the existing implementation, because all other implementations which invoke qemu_chr_be_write() will need to be patched to use the new qemu_chr_be_write(,"id",).

    >       * Size - 32-bit size of the payload
    >     @@ -97,6 +106,13 @@ Depending on the request type, payload
    can be:
    >         log offset: offset from start of supplied file descriptor
    >             where logging starts (i.e. where guest address 0
    would be
    >     logged)
    >     +* Device info
    >     +   --------------------
    >     +   | virito id | uuid |
    >     +   --------------------
    >     +   Virtio id: 16-bit virtio id of the device
    >     +   UUID: 128-bit UUID to identify the QEMU instance that
    >     the device
    >     +
    > I wonder if UUID should be a different message.
    We can make uuid another message if it has other usages.
    Do you see any other usages of uuid?

Allows to associate data/configuration with a particular VM, in a multi-master/single-slave scenario. But tbh, I don't see how this is necessary, I can imagine solving this differently (having different connection address per vm for ex).

Using connection addresses, how could you know if the two connections are from the same VM?

I would like to understand your use case.

Here is an example of the use case:
VM1 has two master connections (connection X and Y) and VM2 has 1 master connection (Z). X,Y,Z - each has a connection id. But X and Y send the same uuid, uuid1, to the slave, and Z sends uuid2 to the slave. In this way, the slave know X and Y are the two connections from the same VM, and Z is a connection from a different VM.

For connection Y, the vhost-pci device will be created in a way which does not need the driver to map the memory, since it has already been mapped by device X from the same VM.

    >      [ Also see the section on REPLY_ACK protocol extension. ]
    >     +Currently, the communication also supports the Slave (server)
    >     sending messages
    >     +to the Master (client). Here is a list of them:
    >     + * VHOST_USER_SET_PEER_CONNECTION (the serve may actively
    >     to disconnect
    >     +   with the client)
    > Oh, you are making the communication bidirectional? This is a
    > fundamental change in the protocol. This may be difficult to
    > in qemu, since the communication in synchronous, a request
    expects an
    > immediate reply, if it gets back a request (from the slave) in the
    > middle, it will fail.

    Not really.
    Adding the above two doesn't affect the existing synchronous read()
    messages (basically, those VHOST_USER_GET_xx messages). Like
    VHOST_USER_SET_FEATURES, the _SET_ messages don't need a reply.
    Here, we
    just make the slave capable of actively sending messages to the

Yes, that's the trouble. At any time the Master may send a request and expects an immediate reply. There is a race of getting a request from the Slave in the middle with your proposed change. I'd rather avoid making the request bidirectionnal if possible. (I proposed a second channel for Slave->Master request in the past: https://lists.gnu.org/archive/html/qemu-devel/2016-04/msg00095.html)

If the message that the slave got has a different "request" field value, it simply drops it and re-read again. The implementation is not complex also, please see the change example to vhost_user_get_u64() below:

   if (vhost_user_write(dev, &msg_w, NULL, 0) < 0) {
       return -1;
    if (vhost_user_read(dev, &msg_r) < 0) {
        return -1;
    if (msg_r.request != msg_w.request)
        goto retry;

On the other side, the slave's request to the master is dropped due to the race. This race can be solved in the protocol layer - let the _SET_ request ask for an ACK, if no ACK is received, re-sent it. Also, this kind of race should be very rare in real usage.

    > +This request should be sent only when
    > has...
    >     +
    >     +      Id: 21
    >     +      Equivalent ioctl: N/A
    >     +      Master payload: dev info
    >     +
    >     +      The client sends the producer device info to the server.
    > "Master sends producer device info to the Slave" works, no?

    Yes, it works. The current dev info only contains a "virtio id" field
    (assume we'll take uuid out as a separate message), which tells the
    slave if it is a net, scsi, console or else. do you see any issue?
    > Could we guarantee this message is sent before SET_VRING*?

    Why do we need to guarantee this?

It would simplify the protocol to have expectations on when messages come. In particular, an early message with devinfo would allow to check/pre-configure the Slave for a particular device. Also VHOST_USER_SET_DEV_INFO should probably be unique (don't allow a device to be reconfigured)

Yes, it is sent in an early age of the vhost-user protocol interaction. It's implemented to be sent right after sending the VHOST_USER_SET_PROTOCOL_FEATURES msg. On the slave side, when it receives SET_DEV_INFO, it pre-configures the device in a table entry (as mentioned before, a device will be created from the table entry at a later stage of the protocol interaction).

I think it should be the implementation logic, like VHOST_USER_SET_PROTOCOL_FEATURES. why do we need to add a guarantee in the protocol to specify the order?

    >     + This request should be sent only when
    >     +      been negotiated.
    >     +
    > I think this message could be useful for other purposes than
    > vhost-pci, thus I would give it its own flag.

    Could you please give an example of other usage? Thanks.

You could have a Slave that implements various devices, and pick the corresponding one dynamically (we already have implementations for net/input/gpu/scsi...)

If I understand the example correctly, the various devices still belongs to the vhost-pci series - in the future we would have vhost-pci-net, vhost-pci-scsi, vhost-pci-gpu etc. If that's the case, we may still use the VHOST_PCI flag.

    >     +
    >     +      Id: 22
    >     +      Equivalent ioctl: N/A
    >     +      Master payload: u64
    >     +
    >     +      The producer device requests to connect or disconnect to
    >     the consumer device.
    > producer->Master, consummer->Slave
    > How does it interact with SET_VRING_ENABLE?

    It's independent of SET_VRING_ENABLE:
    SET_VRING_ENABLE enables a virtq to be in "active".
    SET_PEER_CONNECTION enables the peer (slave or master) device to be in
    "active". The driver shouldn't send packets if the device is inactive.

I fail to see the difference with SET_VRING_ENABLE, perhaps someone more familiar with the protocol could help here.

I'm not sure if another email explaning this was sent out successfully, repost the explanation here:

The SET_PEER_CONNECTION msg is ued to turn "ON/OFF" the (slave or master) device connection status. For example, when the master side VM wants to turn down, the virtio-net driver sets the virtio-net device's PEER_CONNECTION status to "OFF" - before this happens, the virtio-net device needs to sync-up with the vhost-pci-net device first, that is, sending a VHOST_USER_SET_PEER_CONNECTION(cmd=OFF) msg to the master. In return (not as a synchronous reply, because it has to sync with the driver to stop using the slave side resource first), the vhost-pci-net device sends VHOST_USER_SET_PEER_CONNECTION(cmd=OFF) msg to the slave - this sets the virtio-net device's PEER_CONNECTION status to "OFF" and then the virtio driver is ready to unload. (same for the vhost-pci-net driver to unload)

SET_VRING_ENABLE controls the virtq status - the slave should not use the virtq if it's not enabled by the master. For example, a device may have 4 vitrqs, if vq[0].enabled==0, then the slave should not use vitrq 0.

    >     + The consumer device may request to disconnect to the producer
    >     device. This
    >     +      request should be sent only when
    >     +      negotiated.
    >     +      Connection request: If the reply message indicates
    >     "success", the vhost-pci based
    >     +      inter-VM communication channel has been established.
    >     +      Disconnection request: If the reply message indicates
    >     "success", the vhost-pci based
    >     +      inter-VM communication channel has been destroyed.
    >     +      #define VHOST_USER_SET_PEER_CONNECTION_F_OFF 0
    >     +      #define VHOST_USER_SET_PEER_CONNECTION_F_ON 1
    >     +
    I think it would be better to add one more command here:

    The master uses this command to tell the slave it's ready to
    create the
    vhost-pci device. Regarding the implementation, it is put at the
    of vhost_net_start() function (when all the vring info have been sent
    and enabled).

Do you have WIP branch for qemu vhost-pci? That could help to understand the context.

Yes, I can share them.


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]