virtio-dev message

Subject: Re: [PATCH V2 0/2] Vitqueue State Synchronization

From: Jason Wang <jasowang@redhat.com>
To: Stefan Hajnoczi <stefanha@redhat.com>
Date: Tue, 13 Jul 2021 19:56:02 +0800


å 2021/7/13 äå6:30, Stefan Hajnoczi åé:

On Tue, Jul 13, 2021 at 11:08:28AM +0800, Jason Wang wrote:

å 2021/7/12 äå6:12, Stefan Hajnoczi åé:

On Tue, Jul 06, 2021 at 12:33:32PM +0800, Jason Wang wrote:

Hi All:

This is an updated version to implement virtqueue state
synchronization which is a must for the migration support.

The first patch introduces virtqueue states as a new basic facility of
the virtio device. This is used by the driver to save and restore
virtqueue state. The states were split into available state and used
state to ease the transport specific implementation. It is also
allowed for the device to have its own device specific way to save and
resotre extra virtqueue states like in flight request.

The second patch introduce a new status bit STOP. This bit is used for
the driver to stop the device. The major difference from reset is that
STOP must preserve all the virtqueue state plus the device state.

A driver can then:

- Get the virtqueue state if STOP status bit is set
- Set the virtqueue state after FEATURE_OK but before DRIVER_OK

Device specific state synchronization could be built on top.

Will you send a proof-of-concept implementation to demonstrate how it
works in practice?


Eugenio has implemented a prototype for this. (Note that the codes was for
previous version of the proposal, but it's sufficient to demonstrate how it
works).

https://www.mail-archive.com/qemu-devel@nongnu.org/msg809332.html

https://www.mail-archive.com/qemu-devel@nongnu.org/msg809335.html

You mentioned being able to migrate virtio-net devices using this
interface, but what about state like VIRTIO_NET_S_LINK_UP that is either
per-device or associated with a non-rx/tx virtqueue?


Note that the config space will be maintained by Qemu. So Qemu can choose to
emulate link down by simply don't set DRIVER_OK to the device.

Basically I'm not sure if the scope of this is just to migrate state
associated with offloaded virtqueues (vDPA, VFIO/mdev, etc) or if it's
really supposed to migrate the entire device?


As the subject, it's the virtqueue state not the device state. The series
tries to introduce the minimal sets of functions that could be used to
migrate the network device.

Do you have an approach in mind for saving/loading device-specific
state? Here are devices and their state:
- virtio-blk: a list of requests that the destination device can
    re-submit
- virtio-scsi: a list of requests that the destination device can
    re-submit
- virtio-serial: active ports, including the current buffer being
    transferred


Actually, we had two types of additional states:

- pending (or inflight) buffers, we can introduce a transport specific way
to specify the auxiliary page which is used to stored the inflight
descriptors (as what vhost-user did)
- other device states, this needs to be done via a device specific way, and
it would be hard to generalize them

- virtio-net: MAC address, status, etc


So VMM will intercept all the control commands, that means we don't need to
query any states that is changed via those control commands.

E.g The Qemu is in charge of shadowing control virtqueue, so we don't even
need to interface to query any of those states that is set via control
virtqueue.

But all those device state stuffs is out of the scope of this proposal.

I can see one of the possible gap is that people may think the migration
facility is designed for the simple passthrough that Linux provides, that
means the device is assigend 'entirely' to the guest. This is not case for
the case of live migration, some kind of mediation must be done in the
middle.

And that's the work of VMM through vDPA + Qemu: intercepting control command
but not datapath.

I thought this was a more general migration mechanism that passthrough
devices could use. Thanks for explaining. Maybe this can be made clearer
in the spec - it's not a full save/load mechanism, it can only be used
in conjunction with another component that is aware of the device's
state.

Yes, and actually this should be the suggested way for migrating virtiodevice.

The advantage is obvious, to leverage the mature virtio/vhost softwarestack then we don't need to care much about things like migrationcompatibility.


There is a gap between this approach and VFIO's migration interface. It
appears to be impossible to write a VFIO/mdev or vfio-user device that
passes a physical virtio-pci device through to the guest with migration
support.

I think mediation(mdev) is a must for support live migration in thiscase even for VFIO. If you simply assign the device to the guest, theVMM will lose all the control to the device.

And what's more important, virtio is not PCI specific so it can workwhere VFIO can not work:


1) The physical device that doesn't use PCI as its transport
2) The guest that doesn't use PCI or even don't have PCI

That's the consideration for introducing all those as basic facilityfirst. Then we can let the transport to implement them in a transportcomfortable way (admin virtqueue or capabilities).

The reason is because VIRTIO lacks an interface to save/load
device (not virtqueue) state. I guess it will be added sooner or later,
it's similar to what Max Gurtovoy recently proposed.



So my understanding is:

1) Each device should define its own state that needs to be migrated

then, we can define

2) How to design the device interface

Admin virtqueue is a solution for 2) but not 1). And an obvious drawbackfor admin virtqueue is that it's not easily to be used in the nestedenvironment where you still need a per function interface.


Thanks


Stefan

References:
- [PATCH V2 0/2] Vitqueue State Synchronization
  - From: Jason Wang <jasowang@redhat.com>
- Re: [PATCH V2 0/2] Vitqueue State Synchronization
  - From: Stefan Hajnoczi <stefanha@redhat.com>
- Re: [PATCH V2 0/2] Vitqueue State Synchronization
  - From: Jason Wang <jasowang@redhat.com>
- Re: [PATCH V2 0/2] Vitqueue State Synchronization
  - From: Stefan Hajnoczi <stefanha@redhat.com>