[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: Re: [virtio-comment] [PATCH V2 0/2] Introduce VIRTIO_F_QUEUE_STATE
å 2021/3/24 äå6:05, Stefan Hajnoczi åé:
On Wed, Mar 24, 2021 at 03:05:30PM +0800, Jason Wang wrote:å 2021/3/23 äå6:40, Stefan Hajnoczi åé:On Mon, Mar 22, 2021 at 11:47:15AM +0800, Jason Wang wrote:This is a new version to support VIRTIO_F_QUEUE_STATE. The feautre extends the basic facility to allow the driver to set and get device internal virtqueue state. This main motivation is to support live migration of virtio devices.Can you describe the use cases that this interface covers as well as the steps involved in migrating a device?Yes. I can describe the steps for live migrating virtio-net device. For other devices, we probably need other state.Thanks, describing the steps for virtio-net would be great.Traditionally live migration was transparent to the VIRTIO driver because it was performed by the hypervisor.Right, but it could be possible that we may want live migrate between hardware virtio-pci devices. So it's up to the hypversior to save and restore states silently without the notice of guest driver as what we did for vhost.This is where I'd like to understand the steps in detail. The set/get state functionality introduced in this spec change requires that the hypervisor has access to the device's hardware registers - the same registers that the guest is also using. I'd like to understand the lifecycle and how conflicts between the hypervisor and the guest are avoided (unless this is integrated into vDPA/VFIO/SR-IOV in a way that I haven't thought of?).
Let's assume virito device is used through vhost-vdpa. In this case, there's actually two virtio devices
1)Â The device A that is used by vdpa driver and is connected to vdpa bus (vhost-vDPA). Usually it could be a virtio-pci device. 2)Â The device B that is emulated by Qemu. It could be virito-pci or even virtio-mmio device.
So what guest driver can see is device B, and it can only access the status bit of device B. From the view of Qemu, device A works more like a vhost backend. It means it can stop device A (either via reset or other way like dedicated status bit) without noticing guest (touching the device status bit for device A). When we need to live migrate the VM:
1) Qemu need to stop device B (e.g stop vhost-vDPA) 2) Qemu get virtqueue states from device B 3) The virtqueue state will be passed from source to destinition4) Qemu recovered the virtqueue states to device C which is the virtio/vDPA device that is on the destination
5) Qemu resume the dev C (e.g start vhost-vDPA)
I know you're aware but I think it's worth mentioning that this only supports stateless devices.Yes, that's why it's a queue state not a device state.Even the simple virtio-blk device has state in QEMU's implementation. If an I/O request fails it can be held by the device and resumed after live migration instead of failing the request immediately. The list of held requests needs to be migrated with the device and is not part of the virtqueue state.Yes, I think we need to extend virtio spec to support save and restore device state. But anyway the virtqueue state is the infrastructure which should be introdouced first.Introducing virtqueue state save/load first seems fine, but before committing to a spec cange we need an approximate plan for per-device state so that it's clear the design can be extended to cover that case in the future.
Yes, so as discussed. We might at least requires a API to fetch the inflight descriptors.
Haven't thought it deeply but some possible ways: 1) transport specific way 2) generic method like a control vq command1) looks simpler but may end up with function duplication, 2) may need some extension on the current virtio-blk.
Actually we had 3), fetchn the information from the management deviec (like PF).
I'm concerned that using device reset will not work once this interface is extended to support device-specific state (e.g. the virtio-blk failed request list). There could be situations where reset really needs to reset (e.g. freeing I/O resources) and the device therefore cannot hold on to state across reset.Good point. So here're some ways: 1) reuse device reset that is done in this patch 2) intorduce a new device status like what has been done in [1] 3) using queue_enable (as what has been done in the virtio-mmio, pci forbids to stop a queue currently, we may need to extend that) 4) use device specific way to stop the datapath Reusing device reset looks like a shortcut that might not be easy for stateful device as you said. 2) looks more general. 3) have the issues that it doesn't forbid the config changed. And 4) is also proposed by you and Michael. My understanding is that there should be no fundamental differences between 2) and 4). So I tend to respin [1], do you have any other ideas?2 or 4 sound good. I prefer 2 since one standard interface will be less work and complexity than multiple device-specific ways of stopping the data path.
Yes.
3 is more flexible but needs to be augmented with a way to pause the entire device. It could be added on top of 2 or 4, if necessary, in the future. Stefan
Right, let me try to continue the approach of new status bit to see if it works.
Thanks
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]