virtio-comment message

Subject: Re: [virtio-comment] [PATCH V2 0/2] Introduce VIRTIO_F_QUEUE_STATE

From: Jason Wang <jasowang@redhat.com>
To: Stefan Hajnoczi <stefanha@redhat.com>
Date: Thu, 25 Mar 2021 10:57:26 +0800


å 2021/3/24 äå6:05, Stefan Hajnoczi åé:

On Wed, Mar 24, 2021 at 03:05:30PM +0800, Jason Wang wrote:

å 2021/3/23 äå6:40, Stefan Hajnoczi åé:

On Mon, Mar 22, 2021 at 11:47:15AM +0800, Jason Wang wrote:

This is a new version to support VIRTIO_F_QUEUE_STATE. The feautre
extends the basic facility to allow the driver to set and get device
internal virtqueue state. This main motivation is to support live
migration of virtio devices.

Can you describe the use cases that this interface covers as well as the
steps involved in migrating a device?


Yes. I can describe the steps for live migrating virtio-net device. For
other devices, we probably need other state.

Thanks, describing the steps for virtio-net would be great.

   Traditionally live migration was
transparent to the VIRTIO driver because it was performed by the
hypervisor.


Right, but it could be possible that we may want live migrate between
hardware virtio-pci devices. So it's up to the hypversior to save and
restore states silently without the notice of guest driver as what we did
for vhost.

This is where I'd like to understand the steps in detail. The set/get
state functionality introduced in this spec change requires that the
hypervisor has access to the device's hardware registers - the same
registers that the guest is also using. I'd like to understand the
lifecycle and how conflicts between the hypervisor and the guest are
avoided (unless this is integrated into vDPA/VFIO/SR-IOV in a way that I
haven't thought of?).

Let's assume virito device is used through vhost-vdpa. In this case,there's actually two virtio devices

1)Â The device A that is used by vdpa driver and is connected to vdpabus (vhost-vDPA). Usually it could be a virtio-pci device.2)Â The device B that is emulated by Qemu. It could be virito-pci oreven virtio-mmio device.

So what guest driver can see is device B, and it can only access thestatus bit of device B. From the view of Qemu, device A works more likea vhost backend. It means it can stop device A (either via reset orother way like dedicated status bit) without noticing guest (touchingthe device status bit for device A). When we need to live migrate the VM:


1) Qemu need to stop device B (e.g stop vhost-vDPA)
2) Qemu get virtqueue states from device B
3) The virtqueue state will be passed from source to destinition

4) Qemu recovered the virtqueue states to device C which is thevirtio/vDPA device that is on the destination

5) Qemu resume the dev C (e.g start vhost-vDPA)

I know you're aware but I think it's worth mentioning that this only
supports stateless devices.


Yes, that's why it's a queue state not a device state.

   Even the simple virtio-blk device has state
in QEMU's implementation. If an I/O request fails it can be held by the
device and resumed after live migration instead of failing the request
immediately. The list of held requests needs to be migrated with the
device and is not part of the virtqueue state.


Yes, I think we need to extend virtio spec to support save and restore
device state. But anyway the virtqueue state is the infrastructure which
should be introdouced first.

Introducing virtqueue state save/load first seems fine, but before
committing to a spec cange we need an approximate plan for per-device
state so that it's clear the design can be extended to cover that case
in the future.

Yes, so as discussed. We might at least requires a API to fetch theinflight descriptors.


Haven't thought it deeply but some possible ways:

1) transport specific way
2) generic method like a control vq command

1) looks simpler but may end up with function duplication, 2) may needsome extension on the current virtio-blk.

Actually we had 3), fetchn the information from the management deviec(like PF).

I'm concerned that using device reset will not work once this interface
is extended to support device-specific state (e.g. the virtio-blk failed
request list). There could be situations where reset really needs to
reset (e.g. freeing I/O resources) and the device therefore cannot hold
on to state across reset.


Good point. So here're some ways:


1) reuse device reset that is done in this patch
2) intorduce a new device status like what has been done in [1]
3) using queue_enable (as what has been done in the virtio-mmio, pci forbids
to stop a queue currently, we may need to extend that)
4) use device specific way to stop the datapath

Reusing device reset looks like a shortcut that might not be easy for
stateful device as you said. 2) looks more general. 3) have the issues that
it doesn't forbid the config changed. And 4) is also proposed by you and
Michael.

My understanding is that there should be no fundamental differences between
2) and 4). So I tend to respin [1], do you have any other ideas?

2 or 4 sound good. I prefer 2 since one standard interface will be less
work and complexity than multiple device-specific ways of stopping the
data path.



Yes.


3 is more flexible but needs to be augmented with a way to pause the
entire device. It could be added on top of 2 or 4, if necessary, in the
future.

Stefan

Right, let me try to continue the approach of new status bit to see ifit works.


Thanks

Follow-Ups:
- Re: [virtio-comment] [PATCH V2 0/2] Introduce VIRTIO_F_QUEUE_STATE\\
  - From: Stefan Hajnoczi <stefanha@redhat.com>

References:
- [PATCH V2 0/2] Introduce VIRTIO_F_QUEUE_STATE
  - From: Jason Wang <jasowang@redhat.com>
- Re: [virtio-comment] [PATCH V2 0/2] Introduce VIRTIO_F_QUEUE_STATE
  - From: Stefan Hajnoczi <stefanha@redhat.com>
- Re: [virtio-comment] [PATCH V2 0/2] Introduce VIRTIO_F_QUEUE_STATE
  - From: Jason Wang <jasowang@redhat.com>
- Re: [virtio-comment] [PATCH V2 0/2] Introduce VIRTIO_F_QUEUE_STATE
  - From: Stefan Hajnoczi <stefanha@redhat.com>